【MaskRCNN】源码系列三:train&test的RPN

目录

预设anchors

get_anchor

RPN模型

build_rpn_model

ProposalLayer

batch_slice

apply_box_deltas_graph

 clip_boxes_graph

NMS 

我们简单做个总结


预设anchors

RPN定义在mrcnn/model.py的MaskRCNN()类中的build函数中。

        # Anchors
        if mode == "training":
            anchors = self.get_anchors(config.IMAGE_SHAPE)  # shape is [261888,4], 得到了所有尺寸,而且经过归一化后的图片
            # Duplicate across the batch dimension because Keras requires it
            # TODO: can this be optimized to avoid duplicating the anchors?
            anchors = np.broadcast_to(anchors, (config.BATCH_SIZE,) + anchors.shape)
            # A hack to get around Keras's bad support for constants
            anchors = KL.Lambda(lambda x: tf.Variable(anchors), name="anchors")(input_image)
        else:
            anchors = input_anchors

我们这里是进行train和test的rpn介绍,这里我们可以看到,当为train时进入判断获取得到的anchors和不进入train中获取的anchors其实是一样的,都是调用了get_anchors这个函数。

这里我们介绍一下get_anchors函数。

get_anchor

    def get_anchors(self, image_shape):
        """Returns anchor pyramid for the given image size."""
        backbone_shapes = compute_backbone_shapes(self.config, image_shape)
        # Cache anchors and reuse if image shape is the same
        if not hasattr(self, "_anchor_cache"):
            self._anchor_cache = {}
        if not tuple(image_shape) in self._anchor_cache:
            # Generate Anchors
            a = utils.generate_pyramid_anchors(
                self.config.RPN_ANCHOR_SCALES, # [32,64,128,256,512]
                self.config.RPN_ANCHOR_RATIOS, # [0.5,1,2]
                backbone_shapes, # [[256,256],[128,128],[64,64],[32,32],[16,16]]
                self.config.BACKBONE_STRIDES, # [4,8,16,32,64]
                self.config.RPN_ANCHOR_STRIDE) # 1
            # Keep a copy of the latest anchors in pixel coordinates because
            # it's used in inspect_model notebooks.
            # TODO: Remove this after the notebook are refactored to not use it
            self.anchors = a # shape is [261888,4]
            # Normalize coordinates
            self._anchor_cache[tuple(image_shape)] = utils.norm_boxes(a, image_shape[:2])
        return self._anchor_cache[tuple(image_shape)]

这里通过形状和config进行计算backbone_shapes。从上一篇博客中,我们了解到了rpn使用了P2,P3,P4,P5,P6这5个特征图,那么正好与此时的backbone_shapes对应了起来,其具体参数已经写在了代码注释中。

接着将进行generate_pyramid_anchors,这里我就不自己介绍了,请参阅之前的博客

最后再进行一次规范化,具体看norm_boxes:

def norm_boxes(boxes, shape):
    """Converts boxes from pixel coordinates to normalized coordinates. # 只要是normalize,就表明所有的坐标都进行过归一化了
    boxes: [N, (y1, x1, y2, x2)] in pixel coordinates
    shape: [..., (height, width)] in pixels

    Note: In pixel coordinates (y2, x2) is outside the box. But in normalized
    coordinates it's inside the box.

    Returns:
        [N, (y1, x1, y2, x2)] in normalized coordinates
    """
    h, w = shape
    scale = np.array([h - 1, w - 1, h - 1, w - 1])
    shift = np.array([0, 0, 1, 1])
    return np.divide((boxes - shift), scale).astype(np.float32)

但这个规范化并不是规范到0-1之间,而是相对于宽高的压缩。所以规范化值可能会渠道-0.04或者1.15等,也就是会略微超出0和1这个范围,毕竟生成的预设anchorbox就是会超出原图大小。

所以最终的预设anchors的返回值会在大概-0.2到1.25之间,形状为(261888,4)。

RPN模型

        rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE, # 1
                              len(config.RPN_ANCHOR_RATIOS), config.TOP_DOWN_PYRAMID_SIZE) # 3, 256

build_rpn_model

通过build函数中的上述代码,进入到build_rpn_model中一探究竟。

def build_rpn_model(anchor_stride, anchors_per_location, depth): #1,3,256
    input_feature_map = KL.Input(shape=[None, None, depth],
                                 name="input_rpn_feature_map")
    outputs = rpn_graph(input_feature_map, anchors_per_location, anchor_stride) # (?,?,?,256),3,1  ||||||  [(?,?,2),(?,?,2),(?,?,4)]
    return KM.Model([input_feature_map], outputs, name="rpn_model")

再进入到rpn_graph函数仔细探讨。

def rpn_graph(feature_map, anchors_per_location, anchor_stride): # (?,?,?,256),3,1
    """Builds the computation graph of Region Proposal Network.

    feature_map: backbone features [batch, height, width, depth]
    anchors_per_location: number of anchors per pixel in the feature map
    anchor_stride: Controls the density of anchors. Typically 1 (anchors for
                   every pixel in the feature map), or 2 (every other pixel).

    Returns:
        rpn_class_logits: [batch, H * W * anchors_per_location, 2] Anchor classifier logits (before softmax)
        rpn_probs: [batch, H * W * anchors_per_location, 2] Anchor classifier probabilities.
        rpn_bbox: [batch, H * W * anchors_per_location, (dy, dx, log(dh), log(dw))] Deltas to be
                  applied to anchors.
    """
    # TODO: check if stride of 2 causes alignment issues if the feature map
    # is not even.
    # Shared convolutional base of the RPN
    shared = KL.Conv2D(512, (3, 3), padding='same', activation='relu',
                       strides=anchor_stride,
                       name='rpn_conv_shared')(feature_map)

    # Anchor Score. [batch, height, width, anchors per location * 2].
    x = KL.Conv2D(2 * anchors_per_location, (1, 1), padding='valid',
                  activation='linear', name='rpn_class_raw')(shared)

    # Reshape to [batch, anchors, 2]
    rpn_class_logits = KL.Lambda(
        lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 2]))(x)  #[batch, H * W * anchors_per_location, 2]

    # Softmax on last dimension of BG/FG.
    rpn_probs = KL.Activation(
        "softmax", name="rpn_class_xxx")(rpn_class_logits)

    # Bounding box refinement. [batch, H, W, anchors per location * depth]
    # where depth is [x, y, log(w), log(h)]
    x = KL.Conv2D(anchors_per_location * 4, (1, 1), padding="valid",
                  activation='linear', name='rpn_bbox_pred')(shared)

    # Reshape to [batch, anchors, 4]
    rpn_bbox = KL.Lambda(lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 4]))(x) #[batch, H * W * anchors_per_location, 4]

    return [rpn_class_logits, rpn_probs, rpn_bbox]

先看输入参数:

feature_map:上篇博客得到的rpn_feature_maps = [P2, P3, P4, P5, P6]中的特种图,一个一个的被送进rpn_graph中

anchors_per_location:3

anchor_stride:1

这里以P2为例,所以feature_map为(1,256,256,256)。

从上图可以看出rpn_graph共返回rpn_class_logits、rpn_probs、rpn_bbox。P2的形状已经在上图中显示,其他P3、P4、P5、P6以此类推,将得到如下表格:

 rpn_class_logitsrpn_probsrpn_bbox
P2(1,256*256*3,2)(1,256*256*3,2)(1,256*256*3,4)
P3(1,128*128*3,2)(1,128*128*3,2)(1,128*128*3,4)
P4(1,64*64*3,2)(1,64*64*3,2)(1,64*64*3,4)
P5(1,32*32*3,2)(1,32*32*3,2)(1,32*32*3,4)
P6(1,16*16*3,2)(1,16*16*3,2)(1,16*16*3,4)

当返回到build_rpn_model函数中时,不难看出进行了新模型的构建,输入就是特征图,输出就是对应的[rpn_class_logits,rpn_probs,rpn_bbox]。

再跳回到上一级函数,直接就到了build中的下面的代码中:

        # 将特征图送到rpn网络中得到输出结果。
        layer_outputs = []  # list of lists
        for p in rpn_feature_maps:
            layer_outputs.append(rpn([p]))

        output_names = ["rpn_class_logits", "rpn_class", "rpn_bbox"]
        outputs = list(zip(*layer_outputs)) # shape is [3, ?]第一个logits,第二个probs,第三个是4个坐标
        outputs = [KL.Concatenate(axis=1, name=n)(list(o))
                   for o, n in zip(outputs, output_names)]

        rpn_class_logits, rpn_class, rpn_bbox = outputs

这个就是直接将一个个特征图送入到网络中完成的,然后经过堆叠,使得最终的

rpn_class_logits:形状为(1,261888,2);

rpn_class:形状为(1,261888,2),经过了softmax;

rpn_bbox:形状为(1,261888,4)。

这里的261888=256*256*3+128*128*3+64*64*3+32*32*3+16*16*3。

ProposalLayer

从261888个anchorbox中,虽然上面对这些anchorbox做了修正,但是很多并不能被直接使用,需要进行一定的挑选。

        # Generate proposals
        # Proposals are [batch, N, (y1, x1, y2, x2)] in normalized coordinates
        # and zero padded.
        proposal_count = config.POST_NMS_ROIS_TRAINING if mode == "training"\
            else config.POST_NMS_ROIS_INFERENCE
        rpn_rois = ProposalLayer(
            proposal_count=proposal_count,
            nms_threshold=config.RPN_NMS_THRESHOLD,
            name="ROI",
            config=config)([rpn_class, rpn_bbox, anchors]) # shape is [batch, N=2000, (y1, x1, y2, x2)],经过nms,然后不够2000就补成2000,多于2000的就扔掉。另外不会出现超出图片的检测框,有clip

这里proposal_count被设定为2000。

接下来进入到ProposalLayer层。

class ProposalLayer(KE.Layer):
    def __init__(self, proposal_count, nms_threshold, config=None, **kwargs): 

初始化参数

proposal_count:2000

nms_threshold:0.7

config:config

name:"ROI"

接着该类就直接被调用了

    def call(self, inputs):  #[rpn_class, rpn_bbox, anchors]

输入参数为

inputs:[rpn_class, rpn_bbox, anchors]

我嫌弃一句句解释了,以后都代码块进行解释。 

    def call(self, inputs):  #[rpn_class, rpn_bbox, anchors]
        # Box Scores. Use the foreground class confidence. [Batch, num_rois, 1]
        scores = inputs[0][:, :, 1] # softmax后前景的得分,形状为(1,261888,)
        # Box deltas [batch, num_rois, 4]
        deltas = inputs[1] #坐标变换所需参数
        deltas = deltas * np.reshape(self.config.RPN_BBOX_STD_DEV, [1, 1, 4]) 
        # Anchors
        anchors = inputs[2] #anchorboxes

上面就是简单的变量赋值。这里乘以self.config.RPN_BBOX_STD_DEV可以查看https://blog.csdn.net/u013066730/article/details/102504951#build_rpn_targets博客中数据输入时做了除以的操作,这里时为了抵消其带来的影响。

        # Improve performance by trimming to top anchors by score
        # and doing the rest on the smaller subset.
        #这里看看输入的anchors的数量和self.config.PRE_NMS_LIMIT=6000哪个小,选小的那个
        pre_nms_limit = tf.minimum(self.config.PRE_NMS_LIMIT, tf.shape(anchors)[1])
        ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True,
                         name="top_anchors").indices
        scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y),
                                   self.config.IMAGES_PER_GPU)
        deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y),
                                   self.config.IMAGES_PER_GPU)
        pre_nms_anchors = utils.batch_slice([anchors, ix], lambda a, x: tf.gather(a, x),
                                    self.config.IMAGES_PER_GPU,
                                    names=["pre_nms_anchors"])

这里261888个anchorboxes大于6000,所以就只选择前6000个得分的索引值(概率值越大表明越是前景)。batch_slice见下面讲解,这里说说

scores是根据索引选取出来的前6000个判断是否为前景的得分,得分越高越可能是前景;

deltas根据索引选取出来的前6000个偏移转换值,该值是通过上面的build_rpn_model所得到,形状为(6000,4)。

pre_nms_anchors根据索引选取出来的预设anchorbox,这些anchorbox是最有可能为前景的框。

batch_slice

这里介绍以下batch_slice,该函数后面会多次用到,这里仅以[scores,ix]为例,scores的形状为(1,261888),ix的形状为(1,6000),lambda x, y: tf.gather(x, y)函数被输入,然后每张GPU被分配的batchsize。

def batch_slice(inputs, graph_fn, batch_size, names=None):
    if not isinstance(inputs, list):
        inputs = [inputs]

    outputs = []
    for i in range(batch_size):
        inputs_slice = [x[i] for x in inputs] # 循环遍历inputs列表,然后从单个元素中去除第一行数据,在这个例子中就是取出第一个样本的所有得分以及前6000个得分的索引值。
        output_slice = graph_fn(*inputs_slice) # 从样本中选取出对应索引的值
        if not isinstance(output_slice, (tuple, list)):
            output_slice = [output_slice]
        outputs.append(output_slice)
    # Change outputs from a list of slices where each is
    # a list of outputs to a list of outputs and each has
    # a list of slices
    outputs = list(zip(*outputs))

    if names is None:
        names = [None] * len(outputs)

    result = [tf.stack(o, axis=0, name=n)
              for o, n in zip(outputs, names)]
    if len(result) == 1:
        result = result[0]

    return result

将*inputs_slice就是将scores[i]和ix[i]按照batchsize的分开,一个一个的进行graph_fn操作,通过graph_fn得到计算结果保存在output_slice,然后再按照batchsize所在维度进行堆叠,得到最终结果result。

讲完batch_slice的作用,再继续往下走。 

        # Apply deltas to anchors to get refined anchors.
        # [batch, N, (y1, x1, y2, x2)]
        boxes = utils.batch_slice([pre_nms_anchors, deltas],
                                  lambda x, y: apply_box_deltas_graph(x, y),
                                  self.config.IMAGES_PER_GPU,
                                  names=["refined_anchors"])

上面这段代码其实就行将build_rpn_model得到的结果与预设anchorbox进行结合,最终得到经过修正的anchorbox,我们称经过修正的anchorbox成为预测boxes。

涉及apply_box_deltas_graph(),那就介绍这个函数。

apply_box_deltas_graph

def apply_box_deltas_graph(boxes, deltas):
    """Applies the given deltas to the given boxes.
    boxes: [N, (y1, x1, y2, x2)] boxes to update
    deltas: [N, (dy, dx, log(dh), log(dw))] refinements to apply
    """
    # Convert to y, x, h, w
    height = boxes[:, 2] - boxes[:, 0]
    width = boxes[:, 3] - boxes[:, 1]
    center_y = boxes[:, 0] + 0.5 * height
    center_x = boxes[:, 1] + 0.5 * width
    # Apply deltas
    center_y += deltas[:, 0] * height
    center_x += deltas[:, 1] * width
    height *= tf.exp(deltas[:, 2])
    width *= tf.exp(deltas[:, 3])
    # Convert back to y1, x1, y2, x2
    y1 = center_y - 0.5 * height
    x1 = center_x - 0.5 * width
    y2 = y1 + height
    x2 = x1 + width
    result = tf.stack([y1, x1, y2, x2], axis=1, name="apply_box_deltas_out")
    return result

这里主要是要代入公式计算,具体公式如下:

其中Aw,Ah,Ax,Ay都是预设anchorbox的中心点和宽高,dx,dy,dw,dh都是预测得到的偏移量,预设anchorbox经过偏移量的修正,即可得到RPN所需要的最终的预测boxes。

        # Clip to image boundaries. Since we're in normalized coordinates,
        # clip to 0..1 range. [batch, N, (y1, x1, y2, x2)]
        window = np.array([0, 0, 1, 1], dtype=np.float32)
        boxes = utils.batch_slice(boxes,
                                  lambda x: clip_boxes_graph(x, window),
                                  self.config.IMAGES_PER_GPU,
                                  names=["refined_anchors_clipped"])

上述代码主要是由于预测boxes的大小超过0-1这个范围,这显然不合理,检测框只能出现在我图像的内部,所以我们就强制让小于0的值变为0,大于1的值变为1,保证所有的预测boxes都在图像内。

 clip_boxes_graph

def clip_boxes_graph(boxes, window):
    """
    boxes: [N, (y1, x1, y2, x2)]
    window: [4] in the form y1, x1, y2, x2
    """
    # Split
    wy1, wx1, wy2, wx2 = tf.split(window, 4)
    y1, x1, y2, x2 = tf.split(boxes, 4, axis=1)
    # Clip
    y1 = tf.maximum(tf.minimum(y1, wy2), wy1)
    x1 = tf.maximum(tf.minimum(x1, wx2), wx1)
    y2 = tf.maximum(tf.minimum(y2, wy2), wy1)
    x2 = tf.maximum(tf.minimum(x2, wx2), wx1)
    clipped = tf.concat([y1, x1, y2, x2], axis=1, name="clipped_boxes")
    clipped.set_shape((clipped.shape[0], 4))
    return clipped

这段代码就是让不在0-1范围内的框都强制变到0-1区间内。

NMS 

上面该准备的内容已经准备好了,开始进行nms操作。

        # Non-max suppression
        def nms(boxes, scores):
            indices = tf.image.non_max_suppression(
                boxes, scores, self.proposal_count,
                self.nms_threshold, name="rpn_non_max_suppression")
            proposals = tf.gather(boxes, indices)
            # Pad if needed
            padding = tf.maximum(self.proposal_count - tf.shape(proposals)[0], 0)
            proposals = tf.pad(proposals, [(0, padding), (0, 0)]) # 补0,上下左右,也就是下面补0
            return proposals
        proposals = utils.batch_slice([boxes, scores], nms,
                                      self.config.IMAGES_PER_GPU) # 最终的形状应该为[batchsize, N=2000, 4]

boxes预测boxes,其形状为(1,6000,4);

scores:根据得分大小选出的前6000个前景的概率值,该概率值与预测boxes一一对应,其形状为(1,6000);

self.proposal_count:2000,只选2000个前景框;

self.nms_threshold:0.7。

根据tf.image.non_max_suppression()得到最终保留下来的索引,然后根据索引选取对应位置的预测boxes,得到proposals。这个proposals形状可能为(?,1500,4),只有1500个符合要求,但是我们要求必须为(?,2000,4)的形状,所以只能将他补0。

最终返回的proposals就是rpn_rois。

到这里RPN基本就结束了,RPN的损失后面再讲。

我们简单做个总结

首先使用卷积神经网络输出预设框的前背景类别置信值和偏移值,然后根据前背景类别置信值选出前6000个预设框,将预设框与偏移值进行结合,对预设框进行修正得到预测boxes,将预测boxes再经过nms的操作,得到最终proposals

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值