Mask RCNN代码

本文详细介绍了Mask R-CNN的网络结构,包括RPN部分、Network head以及损失函数。深入剖析了锚点生成、Proposal Layer的工作原理,以及PyramidROIAlign层的功能。此外,还讲解了数据处理过程中的关键步骤,如目标分类、边界框回归和掩模预测的损失函数计算。
摘要由CSDN通过智能技术生成

matterport/Mask rcnn

  • model.py是网络主要构建的文件
  • utils.py中的anchor产生函数部分,主要是涉及函数:

RPN部分

scales:(32, 64, 128, 256, 512)

ratios:(0.5, 1, 2)

feature_shapes:[[256, 256], [128, 128], [64, 64], [32, 32], [16, 16]]

feature_strides:[4, 8, 16, 32, 64]

anchor_stride:1

在generate_pyramid_anchor()中对generate_anchors()循环调用,scales, feature_shapes, feature_strides是一一对应来进入generate_anchors()函数。

input image的大小是1024x1024,那么经过4x下采样,得到(256, 256),同样,经过8x,得到(128, 128),这就是feature_strides和feature_shapes的对应关系。scales表示一个在各个FPN level上相同大小的anchor所产生的对应于input image的anchor的是不一样的。在(256, 256)产生的是32x32的,而在(128, 128)时,因为anchor大小不变,而感受野变大,变大2x,所以产生的是64x64的,以此类推。

generate_anchors()函数:当输入为generate_anchors(32, [0.5, 1, 2], [256, 256], 4, 1)时,在256x256大小的feature map上,计算可知,在feature map上的anchor大小是8×8的,映射到input image是32x32大小,其中的shifts_y与shifts_x就是对feature map 256×256上的element按照步长为anchor_stride来进行移动窗口,在此处设置anchor_stide为1,那么将会有256*256个,因为有三个ratios,所以一共有256×256×3个anchor,且通过操作 * feature_stride映射回到input image上。最后由中心坐标和长宽来求box的左上与右下坐标。

def generate_anchors(scales, ratios, shape, feature_stride, anchor_stride):
    """
    scales: 1D array of anchor sizes in pixels. Example: [32, 64, 128]
    ratios: 1D array of anchor ratios of width/height. Example: [0.5, 1, 2]
    shape: [height, width] spatial shape of the feature map over which
            to generate anchors.
    feature_stride: Stride of the feature map relative to the image in pixels.
    anchor_stride: Stride of anchors on the feature map. For example, if the
        value is 2 then generate anchors for every other feature map pixel.
    """
    # Get all combinations of scales and ratios
    # eg 32 和 [0.5, 1, 2]
    scales, ratios = np.meshgrid(np.array(scales), np.array(ratios))
    scales = scales.flatten() # array([32, 32, 32])
    ratios = ratios.flatten() # array([ 0.5,  1. ,  2. ])

    # Enumerate heights and widths from scales and ratios
    heights = scales / np.sqrt(ratios) # array([ 45.254834,  32.,  22.627417])
    widths = scales * np.sqrt(ratios) # array([ 22.627417,  32. ,  45.254834])

    # Enumerate shifts in feature space,eg anchor_stride = 1 , featrue_stride = 4 
    # here, shape = (256, 256)
    # shift_y.shape = (256, ), shift_x.shape = (256, )
    shifts_y = np.arange(0, shape[0], anchor_stride) * feature_stride
    shifts_x = np.arange(0, shape[1], anchor_stride) * feature_stride

    # shifts_x.shape = (256, 256)--->[[0, 1, ...., 256], [0, 1, ...., 256], ...] * 4
    # shifts_y.shape = (256, 256)--->[[0, 0, ...., 0], [1, 1, ...., 1], ....] * 4
    shifts_x, shifts_y = np.meshgrid(shifts_x, shifts_y)

    # Enumerate combinations of shifts, widths, and heights
    # box_widths.shape = (256*256, 3)----> 3 is [22.627417,  32. ,  45.254834]
    # box_centers_x.shape = (256*256, 3)--->256*256 is [0, 1, ...255, 0, 1, ...255,,...]
    box_widths, box_centers_x = np.meshgrid(widths, shifts_x)
    box_heights, box_centers_y = np.meshgrid(heights, shifts_y)

    # Reshape to get a list of (y, x) and a list of (h, w)
    # (256*256*3, 2)
    box_centers = np.stack([box_centers_y, box_centers_x], axis=2).reshape([-1, 2])
    box_sizes = np.stack([box_heights, box_widths], axis=2).reshape([-1, 2])

    # Convert to corner coordinates (y1, x1, y2, x2)
    boxes = np.concatenate([box_centers - 0.5 * box_sizes,
                            box_centers + 0.5 * box_sizes], axis=1)
    # if generate_anchors(32, [0.5, 1, 2], [256, 256], 4, 1), return boxes.shape=(196608, 4)
    return boxes


def generate_pyramid_anchors(scales, ratios, feature_shapes, feature_strides,
                             anchor_stride):
    """Generate anchors at different levels of a feature pyramid. Each scale
    is associated with a level of the pyramid, but each ratio is used in
    all levels of the pyramid.

    Returns:
    anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array. Sorted
        with the same order of the given scales. So, anchors of scale[0] come
        first, then anchors of scale[1], and so on.

    scales (32, 64, 128, 256, 512)
    ratios = [0.5, 1, 2]
    here, feature_shapes*feature_strides = [[1024, 1024], [1024, 1024], ...], input image is 1024x1024
    feature_shapes = 
    [[256 256]
     [128 128]
     [ 64  64]
     [ 32  32]
     [ 16  16]]
    feature_strides = [4, 8, 16, 32, 64]
    anchor_stride = 1
    """
    # Anchors
    # [anchor_count, (y1, x1, y2, x2)]
    anchors = []
    for i in range(len(scales)):
        anchors.append(generate_anchors(scales[i], ratios, feature_shapes[i],
                                        feature_strides[i], anchor_stride))
    return np.concatenate(anchors, axis=0)

model.py中的Class ProposalLayer,实现了网络中的Proposal层,是在rpn和rcnn的中间部分,这一层的输入是rpn_probs和rpn_bbox,代码中有标注,output的shape为(None, self.proposal_count, 4),可以看出是候选框的数量,4为坐标,proposalLayer层是对所有的经过rpn网络得到的proposal进行处理,从fg prob的得分选出top_k的候选框,再使用非极大值抑制进一步减少proposal数量。其中有个细节是利用rpn预测的rpn_bbox的(dy, dx, log(dh), log(dw))来对选出来的anchor原坐标(y1, x1, y2 ,x2)进行位置偏移的精修,rpn_bbox的预测是偏移量,处理的函数是:apply_box_deltas_graph(anchors, deltas)函数

class ProposalLayer(KE.Layer):
    """Receives anchor scores and selects a subset to pass as proposals
    to the second stage. Filtering is done based on anchor scores and
    non-max suppression to remove overlaps. It also applies bounding
    box refinment detals to anchors.

    Inputs:
        rpn_probs: [batch, anchors, (bg prob, fg prob)]
        rpn_bbox: [batch, anchors, (dy, dx, log(dh), log(dw))]

    Returns:
        Proposals in normalized coordinates [batch, rois, (y1, x1, y2, x2)]
    """

    def __init__(self, proposal_count, nms_threshold, anchors,
                 config=None, **kwargs):
        """
        proposal_count = 2000 or 1000 below, nms_threshold=0.7
        anchors: [N, (y1, x1, y2, x2)] anchors defined in image coordinates
        """
        super(ProposalLayer, self).__init__(**kwargs)
        self.config = config
        self.proposal_count = proposal_count
        self.nms_threshold = nms_threshold
        self.anchors = anchors.astype(np.float32)

    def call(self, inputs):
        # inputs is [rpn_class, rpn_bbox]
        # Box Scores. Use the foreground class confidence. [Batch, num_rois, 1]
        scores = inputs[0][:, :, 1]
        # Box deltas [batch, num_rois, 4]
        deltas = inputs[1]
        # RPN_BBOX_STD_DEV = np.array([0.08, 0.08, 0.17, 0.17])
        # RPN_BBOX_STD_MEANS = np.array([0.02, 0.02, 0.01, 0.02])
        deltas = (deltas + np.reshape(self.config.RPN_BBOX_STD_MEANS, [1, 1, 4])) * np.reshape(self.config.RPN_BBOX_STD_DEV, [1, 1, 4])
        # Base anchors
        anchors = self.anchors

        # Improve performance by trimming to top anchors by score
        # and doing the rest on the smaller subset.

        # self.anchors generated by utils.generate_pyramid_anchors
        # anchor.shape = [anchor_num, 4]
        pre_nms_limit = min(6000, self.anchors.shape[0])
        ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True,
                         name="top_anchors").indices
        scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y),
                                   self.config.IMAGES_PER_GPU)
        deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y),
                                   self.config.IMAGES_PER_GPU)
        anchors = utils.batch_slice(ix, lambda x: tf.gather(anchors, x),
                                    self.config.IMAGES_PER_GPU,
                                    names=["pre_nms_anchors"])

        # Apply deltas to anchors to get refined anchors.
        # [batch, N, (y1, x1, y2, x2)]
        boxes = utils.batch_slice([anchors, deltas],
                                  lambda x, y: apply_box_deltas_graph(x, y),
                                  self.config.IMAGES_PER_GPU,
            
  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值