MaskRCNN源码解析2：特征图与anchors生成

最新推荐文章于 2024-06-29 15:46:34 发布

业余狙击手19

最新推荐文章于 2024-06-29 15:46:34 发布

阅读量3.6k

点赞数 7

分类专栏： # 目标检测算法

本文链接：https://blog.csdn.net/sxlsxl119/article/details/103433066

版权

目标检测算法专栏收录该内容

28 篇文章 17 订阅

订阅专栏

MaskRCNN源码解析1：整体结构概述

MaskRCNN源码解析2：特征图与anchors生成

MaskRCNN源码解析3：RPN、ProposalLayer、DetectionTargetLayer

MaskRCNN源码解析4-0：ROI Pooling 与 ROI Align理论

MaskRCNN源码解析4：头网络(Networks Heads)解析

MaskRCNN源码解析5：损失部分解析

目录

MaskRCNN概述：

A)，特征图与anchors生成

1，从下到上层

2，从上到下层与横向连接

3，anchors生成

MaskRCNN概述：

Mask R-CNN是一个小巧、灵活的通用对象实例分割框架（object instance segmentation）。它不仅可对图像中的目标进行检测，还可以对每一个目标给出一个高质量的分割结果。它在Faster R-CNN[1]基础之上进行扩展，并行地在bounding box recognition分支上添加一个用于预测目标掩模（object mask）的新分支。该网络还很容易扩展到其他任务中，比如估计人的姿势，也就是关键点识别（person keypoint detection）。该框架在COCO的一些列挑战任务重都取得了最好的结果，包括实例分割（instance segmentation）、候选框目标检测（bounding-box object detection）和人关键点检测（person keypoint detection）。

参考文章：

Mask RCNN 学习笔记

MaskRCNN源码解读

令人拍案称奇的Mask RCNN

论文笔记：Mask R-CNN

Mask R-CNN个人理解

解析源码地址：

https://github.com/matterport/Mask_RCNN

A)，特征图与anchors生成

本文主要解析下面几个部分

1，从下到上层
2，从上到下层与横向连接，
3，anchors生成

1，从下到上层

该部分其实没啥好说的，就是提取各层特征图。

结构图如下：

源代码如下：

"""
Build a ResNet graph.
    architecture: Can be resnet50 or resnet101
    stage5: Boolean. If False, stage5 of the network is not created
    train_bn: Boolean. Train or freeze Batch Norm layers
    
建立一个ResNet计算图。
  结构：可以是resnet50或resnet101
  stage5：布尔值。 如果为False，则不会创建网络的stage5
  train_bn：布尔值。 训练或冻结Batch Norm图层
"""
def resnet_graph(input_image, architecture, stage5=False, train_bn=True):

    assert architecture in ["resnet50", "resnet101"]
    # Stage 1
    x = KL.ZeroPadding2D((3, 3))(input_image)
    x = KL.Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=True)(x)
    x = BatchNorm(name='bn_conv1')(x, training=train_bn)
    x = KL.Activation('relu')(x)
    C1 = x = KL.MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)      # C1:256*256*64表示特征图的大小为256*256，特征图的个数是64个
    # Stage 2
    x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1), train_bn=train_bn)  # ***  卷积模块，有strides=(1, 1)
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='b', train_bn=train_bn)   # ***  卷积模块，无strides=(1, 1)
    C2 = x = identity_block(x, 3, [64, 64, 256], stage=2, block='c', train_bn=train_bn)   # C2:256*256*256表示特征图的大小为256*256，共有256个特征图
    # Stage 3
    x = conv_block(x, 3, [128, 128, 512], stage=3, block='a', train_bn=train_bn)
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='b', train_bn=train_bn)
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='c', train_bn=train_bn)
    C3 = x = identity_block(x, 3, [128, 128, 512], stage=3, block='d', train_bn=train_bn)   # C3：128*128*512表示特征图的大小为128*128，共有512个特征图
    # Stage 4
    x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a', train_bn=train_bn)
    block_count = {"resnet50": 5, "resnet101": 22}[architecture]
    for i in range(block_count):
        x = identity_block(x, 3, [256, 256, 1024], stage=4, block=chr(98 + i), train_bn=train_bn)
    C4 = x    # C4：64*64*1024表示特征图的大小为64*64，共有1024个特征图
    # Stage 5
    if stage5:
        x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a', train_bn=train_bn)
        x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b', train_bn=train_bn)
        C5 = x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c', train_bn=train_bn)  # C5：32*32*2048表示特征图的大小为32*32，共有2048个特征图
    else:
        C5 = None
    return [C1, C2, C3, C4, C5]

2，从上到下层与横向连接

该部分其实没啥好说的，就是提取各层特征图。

结构图如下：

源代码如下：

        # ************************2，自上而下的图层************************************************************************
        # Top-down Layers      自上而下的图层
        # TODO: add assert to varify feature map sizes match what's in config  添加断言以验证特征图大小是否与配置中的内容匹配

        # 将C5经过256个1*1的卷积核操作得到：32*32*256，记为P5
        # P5 = conv1(C5)
        P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c5p5')(C5)  # TOP_DOWN_PYRAMID_SIZE = 256 用于构建特征金字塔自上而下图层的大小

        # 将P5进行步长为2的上采样得到64 * 64 * 256，再与C4经过的256个1 * 1卷积核操作得到的结果相加，得到64 * 64 * 256，记为P4
        # P4 = up2(P5) + conv1(C4)
        P4 = KL.Add(name="fpn_p4add")([
            KL.UpSampling2D(size=(2, 2), name="fpn_p5upsampled")(P5),
            KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c4p4')(C4)])

        # 将P4进行步长为2的上采样得到128 * 128 * 256，再与C3经过的256个1 * 1卷积核操作得到的结果相加，得到128 * 128 * 256，记为P3
        # P3 = up2(P4) + conv1(C3)
        P3 = KL.Add(name="fpn_p3add")([
            KL.UpSampling2D(size=(2, 2), name="fpn_p4upsampled")(P4),
            KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c3p3')(C3)])

        # 将P3进行步长为2的上采样得到256 * 256 * 256，再与C2经过的256个1 * 1卷积核操作得到的结果相加，得到256 * 256 * 256，记为P2
        # P2 = up2(P3) + conv1(C2)
        P2 = KL.Add(name="fpn_p2add")([
            KL.UpSampling2D(size=(2, 2), name="fpn_p3upsampled")(P3),
            KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c2p2')(C2)])

        # Attach 3x3 conv to all P layers to get the final feature maps.  再对各个Px做一次3*3卷积以获得最终的特征图
        P2 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p2")(P2)
        P3 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p3")(P3)
        P4 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p4")(P4)
        P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p5")(P5)
        # P6 is used for the 5th anchor scale in RPN. Generated by subsampling from P5 with stride of 2.
        # P6用于RPN中的第五个anchor标尺，将P5进行步长为2的最大池化操作得到：16 * 16 * 256，记为P6
        P6 = KL.MaxPooling2D(pool_size=(1, 1), strides=2, name="fpn_p6")(P5)

        # Note that P6 is used in RPN, but not in the classifier heads.
        # 请注意，RPN中使用了P6，但分类器头中没有使用。
        rpn_feature_maps = [P2, P3, P4, P5, P6]
        mrcnn_feature_maps = [P2, P3, P4, P5]

3，anchors生成

这部分在源代码里是有一个专门的函数:generate_anchors()，我在jupyter notebook 里用特例跑了一下，用特例跑还是比较直观的，就不多说了。

import os
import sys
import numpy as np

scales=5              # 方形锚边的长度（以像素为单位）
ratios=[0.5, 1, 2]    # 每个单元的锚的比率（宽度 / 高度）。 值为1表示方形锚点，值为0.5表示宽锚点
shape=[4, 4]          # backbone_shapes = [[256, 256],... [16, 16]]中的一个，基本anchor大小，我以[4,4]举例
feature_stride=8      # BACKBONE_STRIDES = [4, 8, 16, 32, 64]  中的一个 相对原图的缩放率
anchor_stride=1       # 每隔几个点生成anchors

	scales, ratios = np.meshgrid(np.array(scales), np.array(ratios))  # 假设输入进来的
	scales = scales.flatten()  # scales=[[5] [5] [5]] -> [5 5 5]
	ratios = ratios.flatten()  # ratios=[[0.5] [1. ] [2. ]] -> [0.5 1.  2. ]

	print("当前特征图anchor的基本宽高：",scales )
	print("宽高比率ratios：", ratios)

当前特征图anchor的基本宽高： [5 5 5]
宽高比率ratios： [0.5 1. 2. ]

	# Enumerate heights and widths from scales and ratios   通过scales和ratios 枚举高和宽
	heights = scales / np.sqrt(ratios)  # heights = [7.07 5.  3.54  ]
	widths = scales * np.sqrt(ratios)  # widths =[3.54   5. 7.07]

	print("当前特征图anchor的基本高：",heights )
	print("当前特征图anchor的基本宽：",widths )

当前特征图anchor的基本高： [7.07106781 5. 3.53553391]
当前特征图anchor的基本宽： [3.53553391 5. 7.07106781]

	# Enumerate shifts in feature space
	shifts_y = np.arange(0, shape[0],
						 anchor_stride) * feature_stride  # [0,1,2,3,4,5...shape[0]]*feature_stride (原图到此特征图的缩放率）
	shifts_x = np.arange(0, shape[1],
						 anchor_stride) * feature_stride  # 假设feature_stride=4，shifts_x =[0,4,8,12,16,20...]
	shifts_x, shifts_y = np.meshgrid(shifts_x, shifts_y)  # numpy.meshgrid()——生成网格点坐标矩阵。

	print("坐标矩阵shifts_x：", shifts_x)
	print("坐标矩阵shifts_y：", shifts_y)

坐标矩阵shifts_x： [[ 0 8 16 24]
[ 0 8 16 24]
[ 0 8 16 24]
[ 0 8 16 24]]
坐标矩阵shifts_y： [[ 0 0 0 0]
[ 8 8 8 8]
[16 16 16 16]
[24 24 24 24]]

	# Enumerate combinations of shifts, widths, and heights
	box_widths, box_centers_x = np.meshgrid(widths, shifts_x)
	box_heights, box_centers_y = np.meshgrid(heights, shifts_y)

	print("box_widths：", box_widths)
	print("box_centers_x：", box_centers_x)
	print("box_heights：", box_heights)
	print("box_centers_y：", box_centers_y)

box_widths： [[3.53553391 5. 7.07106781]
[3.53553391 5. 7.07106781]
[3.53553391 5. 7.07106781]
[3.53553391 5. 7.07106781]
[3.53553391 5. 7.07106781]
[3.53553391 5. 7.07106781]
[3.53553391 5. 7.07106781]
[3.53553391 5. 7.07106781]
[3.53553391 5. 7.07106781]
[3.53553391 5. 7.07106781]
[3.53553391 5. 7.07106781]
[3.53553391 5. 7.07106781]
[3.53553391 5. 7.07106781]
[3.53553391 5. 7.07106781]
[3.53553391 5. 7.07106781]
[3.53553391 5. 7.07106781]]
box_centers_x： [[ 0 0 0]
[ 8 8 8]
[16 16 16]
[24 24 24]
[ 0 0 0]
[ 8 8 8]
[16 16 16]
[24 24 24]
[ 0 0 0]
[ 8 8 8]
[16 16 16]
[24 24 24]
[ 0 0 0]
[ 8 8 8]
[16 16 16]
[24 24 24]]
box_heights： [[7.07106781 5. 3.53553391]
[7.07106781 5. 3.53553391]
[7.07106781 5. 3.53553391]
[7.07106781 5. 3.53553391]
[7.07106781 5. 3.53553391]
[7.07106781 5. 3.53553391]
[7.07106781 5. 3.53553391]
[7.07106781 5. 3.53553391]
[7.07106781 5. 3.53553391]
[7.07106781 5. 3.53553391]
[7.07106781 5. 3.53553391]
[7.07106781 5. 3.53553391]
[7.07106781 5. 3.53553391]
[7.07106781 5. 3.53553391]
[7.07106781 5. 3.53553391]
[7.07106781 5. 3.53553391]]
box_centers_y： [[ 0 0 0]
[ 0 0 0]
[ 0 0 0]
[ 0 0 0]
[ 8 8 8]
[ 8 8 8]
[ 8 8 8]
[ 8 8 8]
[16 16 16]
[16 16 16]
[16 16 16]
[16 16 16]
[24 24 24]
[24 24 24]
[24 24 24]
[24 24 24]]

	# Reshape to get a list of (y, x) and a list of (h, w)
	box_centers = np.stack(
		[box_centers_y, box_centers_x], axis=2).reshape([-1, 2])
	box_sizes = np.stack([box_heights, box_widths], axis=2).reshape([-1, 2])

	print("box_centers：", box_centers)
	print("box_sizes：", box_sizes)

box_centers： [[ 0 0]
[ 0 0]
[ 0 0]
[ 0 8]
[ 0 8]
[ 0 8]
[ 0 16]
[ 0 16]
[ 0 16]
[ 0 24]
[ 0 24]
[ 0 24]
[ 8 0]
[ 8 0]
[ 8 0]
[ 8 8]
[ 8 8]
[ 8 8]
[ 8 16]
[ 8 16]
[ 8 16]
[ 8 24]
[ 8 24]
[ 8 24]
[16 0]
[16 0]
[16 0]
[16 8]
[16 8]
[16 8]
[16 16]
[16 16]
[16 16]
[16 24]
[16 24]
[16 24]
[24 0]
[24 0]
[24 0]
[24 8]
[24 8]
[24 8]
[24 16]
[24 16]
[24 16]
[24 24]
[24 24]
[24 24]]
box_sizes： [[7.07106781 3.53553391]
[5. 5. ]
[3.53553391 7.07106781]
[7.07106781 3.53553391]
[5. 5. ]
[3.53553391 7.07106781]
[7.07106781 3.53553391]
[5. 5. ]
[3.53553391 7.07106781]
[7.07106781 3.53553391]
[5. 5. ]
[3.53553391 7.07106781]
[7.07106781 3.53553391]
[5. 5. ]
[3.53553391 7.07106781]
[7.07106781 3.53553391]
[5. 5. ]
[3.53553391 7.07106781]
[7.07106781 3.53553391]
[5. 5. ]
[3.53553391 7.07106781]
[7.07106781 3.53553391]
[5. 5. ]
[3.53553391 7.07106781]
[7.07106781 3.53553391]
[5. 5. ]
[3.53553391 7.07106781]
[7.07106781 3.53553391]
[5. 5. ]
[3.53553391 7.07106781]
[7.07106781 3.53553391]
[5. 5. ]
[3.53553391 7.07106781]
[7.07106781 3.53553391]
[5. 5. ]
[3.53553391 7.07106781]
[7.07106781 3.53553391]
[5. 5. ]
[3.53553391 7.07106781]
[7.07106781 3.53553391]
[5. 5. ]
[3.53553391 7.07106781]
[7.07106781 3.53553391]
[5. 5. ]
[3.53553391 7.07106781]
[7.07106781 3.53553391]
[5. 5. ]
[3.53553391 7.07106781]]

	# Convert to corner coordinates (y1, x1, y2, x2)
	boxes = np.concatenate([box_centers - 0.5 * box_sizes,
							box_centers + 0.5 * box_sizes], axis=1)

	print("boxes：", boxes)

boxes： [[-3.53553391 -1.76776695 3.53553391 1.76776695]
[-2.5 -2.5 2.5 2.5 ]
[-1.76776695 -3.53553391 1.76776695 3.53553391]
[-3.53553391 6.23223305 3.53553391 9.76776695]
[-2.5 5.5 2.5 10.5 ]
[-1.76776695 4.46446609 1.76776695 11.53553391]
[-3.53553391 14.23223305 3.53553391 17.76776695]
[-2.5 13.5 2.5 18.5 ]
[-1.76776695 12.46446609 1.76776695 19.53553391]
[-3.53553391 22.23223305 3.53553391 25.76776695]
[-2.5 21.5 2.5 26.5 ]
[-1.76776695 20.46446609 1.76776695 27.53553391]
[ 4.46446609 -1.76776695 11.53553391 1.76776695]
[ 5.5 -2.5 10.5 2.5 ]
[ 6.23223305 -3.53553391 9.76776695 3.53553391]
[ 4.46446609 6.23223305 11.53553391 9.76776695]
[ 5.5 5.5 10.5 10.5 ]
[ 6.23223305 4.46446609 9.76776695 11.53553391]
[ 4.46446609 14.23223305 11.53553391 17.76776695]
[ 5.5 13.5 10.5 18.5 ]
[ 6.23223305 12.46446609 9.76776695 19.53553391]
[ 4.46446609 22.23223305 11.53553391 25.76776695]
[ 5.5 21.5 10.5 26.5 ]
[ 6.23223305 20.46446609 9.76776695 27.53553391]
[12.46446609 -1.76776695 19.53553391 1.76776695]
[13.5 -2.5 18.5 2.5 ]
[14.23223305 -3.53553391 17.76776695 3.53553391]
[12.46446609 6.23223305 19.53553391 9.76776695]
[13.5 5.5 18.5 10.5 ]
[14.23223305 4.46446609 17.76776695 11.53553391]
[12.46446609 14.23223305 19.53553391 17.76776695]
[13.5 13.5 18.5 18.5 ]
[14.23223305 12.46446609 17.76776695 19.53553391]
[12.46446609 22.23223305 19.53553391 25.76776695]
[13.5 21.5 18.5 26.5 ]
[14.23223305 20.46446609 17.76776695 27.53553391]
[20.46446609 -1.76776695 27.53553391 1.76776695]
[21.5 -2.5 26.5 2.5 ]
[22.23223305 -3.53553391 25.76776695 3.53553391]
[20.46446609 6.23223305 27.53553391 9.76776695]
[21.5 5.5 26.5 10.5 ]
[22.23223305 4.46446609 25.76776695 11.53553391]
[20.46446609 14.23223305 27.53553391 17.76776695]
[21.5 13.5 26.5 18.5 ]
[22.23223305 12.46446609 25.76776695 19.53553391]
[20.46446609 22.23223305 27.53553391 25.76776695]
[21.5 21.5 26.5 26.5 ]
[22.23223305 20.46446609 25.76776695 27.53553391]]