【MaskRCNN】源码系列二：train&test特征图

最新推荐文章于 2024-06-19 21:22:49 发布

mjiansun

最新推荐文章于 2024-06-19 21:22:49 发布

阅读量1.1k

点赞数 3

分类专栏：论文笔记 Keras

本文链接：https://blog.csdn.net/u013066730/article/details/102523638

版权

论文笔记同时被 2 个专栏收录

87 篇文章

订阅专栏

Keras

43 篇文章

订阅专栏

本文深入解析Mask R-CNN模型的构建过程，包括ResNet骨干网络、FPN金字塔网络及RPN和MRCNN的特征图使用。揭示了如何通过不同阶段的卷积和身份块实现特征提取和分辨率调整，最终生成用于实例分割和目标检测的多尺度特征图。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

特征图：是指通过CNN提取，被后续的RPN，RCNN使用的特征图。要求输入图像必须是64的倍数，具体原因请看下面的介绍。

从samples/coco/coco.py：

model = modellib.MaskRCNN(mode="training", config=config,
                          model_dir=args.logs)

到mrcnn/model.py：

    def __init__(self, mode, config, model_dir):
        """
        mode: Either "training" or "inference"
        config: A Sub-class of the Config class
        model_dir: Directory to save training logs and trained weights
        """
        assert mode in ['training', 'inference']
        self.mode = mode
        self.config = config
        self.model_dir = model_dir
        self.set_log_dir()
        self.keras_model = self.build(mode=mode, config=config)

再到mrcnn/model.py：

    def build(self, mode, config):

再到build函数中的模型建立模块

        if callable(config.BACKBONE):
            _, C2, C3, C4, C5 = config.BACKBONE(input_image, stage5=True,
                                                train_bn=config.TRAIN_BN)
        else:
            _, C2, C3, C4, C5 = resnet_graph(input_image, config.BACKBONE,
                                             stage5=True, train_bn=config.TRAIN_BN)

config.BACKBONE=‘resnet101’，但是没有这个函数，只有resnet_graph这个函数，这里我不知道作者想表达什么意思，不管了。

正式进入到我们的模型构建了：

def resnet_graph(input_image, architecture, stage5=False, train_bn=True):

输入参数

input_image：（batchsize，1024，1024，3）

architecture：resnet101

stage5：True

train_bn：False，这里设置为False，后面在训练的时候会重新设置。

接下来我们看看到底有哪几个步骤

Stage1

    # Stage 1
    x = KL.ZeroPadding2D((3, 3))(input_image)
    x = KL.Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=True)(x)
    x = BatchNorm(name='bn_conv1')(x, training=train_bn)
    x = KL.Activation('relu')(x)
    C1 = x = KL.MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)

第一个？表示batchsize，特征图+通道数。stage1得到的结果为(?,256,256,64)

Stage2(C2)

    # Stage 2
    x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1), train_bn=train_bn)
    x = identity_block(x, 3, [64, 64, 256], stage=2, block='b', train_bn=train_bn)
    C2 = x = identity_block(x, 3, [64, 64, 256], stage=2, block='c', train_bn=train_bn)

上面这段代码包含了conv_block，identity_block，这里就先看看这2个函数到底干了什么事情。

conv_block

def conv_block(input_tensor, kernel_size, filters, stage, block,
               strides=(2, 2), use_bias=True, train_bn=True):
    """conv_block is the block that has a conv layer at shortcut
    # Arguments
        input_tensor: input tensor
        kernel_size: default 3, the kernel size of middle conv layer at main path
        filters: list of integers, the nb_filters of 3 conv layer at main path
        stage: integer, current stage label, used for generating layer names
        block: 'a','b'..., current block label, used for generating layer names
        use_bias: Boolean. To use or not use a bias in conv layers.
        train_bn: Boolean. Train or freeze Batch Norm layers
    Note that from stage 3, the first conv layer at main path is with subsample=(2,2)
    And the shortcut should have subsample=(2,2) as well
    """
    nb_filter1, nb_filter2, nb_filter3 = filters
    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'

    x = KL.Conv2D(nb_filter1, (1, 1), strides=strides,
                  name=conv_name_base + '2a', use_bias=use_bias)(input_tensor)
    x = BatchNorm(name=bn_name_base + '2a')(x, training=train_bn)
    x = KL.Activation('relu')(x)

    x = KL.Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same',
                  name=conv_name_base + '2b', use_bias=use_bias)(x)
    x = BatchNorm(name=bn_name_base + '2b')(x, training=train_bn)
    x = KL.Activation('relu')(x)

    x = KL.Conv2D(nb_filter3, (1, 1), name=conv_name_base +
                  '2c', use_bias=use_bias)(x)
    x = BatchNorm(name=bn_name_base + '2c')(x, training=train_bn)

    shortcut = KL.Conv2D(nb_filter3, (1, 1), strides=strides,
                         name=conv_name_base + '1', use_bias=use_bias)(input_tensor)
    shortcut = BatchNorm(name=bn_name_base + '1')(shortcut, training=train_bn)

    x = KL.Add()([x, shortcut])
    x = KL.Activation('relu', name='res' + str(stage) + block + '_out')(x)
    return x

输入参数：

input_tensor：具体以实际情况而定，这里是（?,256,256,64）

kernel_size：3

filters：[64,64,256]

stage：2

block：'a'

strides：(1, 1)或者（2，2），这里以（1，1）举例

use_bias：True

train_bn：False

最终返回结果为（?,256,256,256）。

identity_block

def identity_block(input_tensor, kernel_size, filters, stage, block,
                   use_bias=True, train_bn=True):
    """The identity_block is the block that has no conv layer at shortcut
    # Arguments
        input_tensor: input tensor
        kernel_size: default 3, the kernel size of middle conv layer at main path
        filters: list of integers, the nb_filters of 3 conv layer at main path
        stage: integer, current stage label, used for generating layer names
        block: 'a','b'..., current block label, used for generating layer names
        use_bias: Boolean. To use or not use a bias in conv layers.
        train_bn: Boolean. Train or freeze Batch Norm layers
    """
    nb_filter1, nb_filter2, nb_filter3 = filters
    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'

    x = KL.Conv2D(nb_filter1, (1, 1), name=conv_name_base + '2a',
                  use_bias=use_bias)(input_tensor)
    x = BatchNorm(name=bn_name_base + '2a')(x, training=train_bn)
    x = KL.Activation('relu')(x)

    x = KL.Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same',
                  name=conv_name_base + '2b', use_bias=use_bias)(x)
    x = BatchNorm(name=bn_name_base + '2b')(x, training=train_bn)
    x = KL.Activation('relu')(x)

    x = KL.Conv2D(nb_filter3, (1, 1), name=conv_name_base + '2c',
                  use_bias=use_bias)(x)
    x = BatchNorm(name=bn_name_base + '2c')(x, training=train_bn)

    x = KL.Add()([x, input_tensor])
    x = KL.Activation('relu', name='res' + str(stage) + block + '_out')(x)
    return x

输入参数：

input_tensor：具体以实际情况而定，这里是（?,256,256,256）

kernel_size：3

filters：[64,64,256]

stage：2

block：'b'

use_bias：True

train_bn：False

该函数返回的形状为(?,256,256,256)。

这里stage2进行了conv_block，identity_block，identity_block。

	stage	block	kernel_size	stride	filters	输入尺寸	输出尺寸
conv_block	2	'a'	3	1	[64,64,256]	（？，256，256，64）	（？，256，256，256）
identity_block	2	'b'	3	1	[64,64,256]	（？，256，256，256）	（？，256，256，256）
identity_block	2	'c'	3	1	[64,64,256]	（？，256，256，256）	（？，256，256，256）

最终x和C2的输出形状都为(?,256,256,256)。

Stage3(C3)

    # Stage 3
    x = conv_block(x, 3, [128, 128, 512], stage=3, block='a', train_bn=train_bn)
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='b', train_bn=train_bn)
    x = identity_block(x, 3, [128, 128, 512], stage=3, block='c', train_bn=train_bn)
    C3 = x = identity_block(x, 3, [128, 128, 512], stage=3, block='d', train_bn=train_bn)

共进行了conv_block、identity_block、identity_block、identity_block。

	stage	block	kernel_size	stride	filters	输入尺寸	输出尺寸
conv_block	3	'a'	3	2	[128,128,512]	（？，256，256，256）	（？，128，128，512）
identity_block	3	'b'	3	1	[128,128,512]	（？，128，128，512）	（？，128，128，512）
identity_block	3	'c'	3	1	[128,128,512]	（？，128，128，512）	（？，128，128，512）
identity_block	3	'd'	3	1	[128,128,512]	（？，128，128，512）	（？，128，128，512）

最终x和C3的输出形状都为(?,128,128,512)。

Stage4(C4)

    # Stage 4
    x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a', train_bn=train_bn)
    block_count = {"resnet50": 5, "resnet101": 22}[architecture]
    for i in range(block_count):
        x = identity_block(x, 3, [256, 256, 1024], stage=4, block=chr(98 + i), train_bn=train_bn)
    C4 = x

	stage	block	kernel_size	stride	filters	输入尺寸	输出尺寸
conv_block	4	'a'	3	2	[256,256,1024]	（？，128，128，512）	（？，64，64，1024）
identity_block(重复22次，i就表示第几次)	4	98+i	3	1	[256,256,1024]	（？，64，64，1024）	（？，64，64，1024）

最终x和C4的输出形状为(?,64,64,1024)。

Stage5(C5)

        x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a', train_bn=train_bn)
        x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b', train_bn=train_bn)
        C5 = x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c', train_bn=train_bn)

	stage	block	kernel_size	stride	filters	输入尺寸	输出尺寸
conv_block	5	'a'	3	2	[512,512,2048]	（？，64，64，1024）	（？，32，32，2048）
identity_block	5	'b'	3	1	[512,512,2048]	（？，32，32，2048）	（？，32，32，2048）
identity_block	5	'c'	3	1	[512,512,2048]	（？，32，32，2048）	（？，32，32，2048）

最终的x和C5的输出形状为(?,32,32,2048)。

return [C1, C2, C3, C4, C5]返回C1,C2,C3,C4,C5的列表。

P5

        P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c5p5')(C5)

由于C5的形状为（？，32，32，2048），要求输出的通道数为256，所以P5的形状为（？，32，32，256）。

P4

        P4 = KL.Add(name="fpn_p4add")([
            KL.UpSampling2D(size=(2, 2), name="fpn_p5upsampled")(P5),
            KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c4p4')(C4)])

P5进行2倍的上采样，其形状变为（？，64，64，256）。C4经过1*1的卷积变换通道数量，其形状为（？，64，64，256）。再将这2个变换过的张量进行求和，得到P4，其形状为（？，64，64，256）。

P3

        P3 = KL.Add(name="fpn_p3add")([
            KL.UpSampling2D(size=(2, 2), name="fpn_p4upsampled")(P4),
            KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c3p3')(C3)])

将P4进行2倍的上采样，其形状变为（？，128，128，256）。C3经过1*1的卷积变换通道数量，其形状为（？，128，128，256）。再将这2个变换过的张量进行求和，得到P3，其形状为（？，128，128，256）。

P2

        P2 = KL.Add(name="fpn_p2add")([
            KL.UpSampling2D(size=(2, 2), name="fpn_p3upsampled")(P3),
            KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c2p2')(C2)])

将P3进行2倍的上采样，其形状变为（？，256，256，256）。C2经过1*1的卷积变换通道数量，其形状为（？，256，256，256）。再将这2个变换过的张量进行求和，得到P2，其形状为（？，256，256，256）。

最终的P2、P3、P4、P5、P6

        # Attach 3x3 conv to all P layers to get the final feature maps.
        P2 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p2")(P2)
        P3 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p3")(P3)
        P4 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p4")(P4)
        P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p5")(P5)
        # P6 is used for the 5th anchor scale in RPN. Generated by
        # subsampling from P5 with stride of 2.
        P6 = KL.MaxPooling2D(pool_size=(1, 1), strides=2, name="fpn_p6")(P5)

P2、P3、P4、P5都是经过一个（3，3）的卷积，大小不变，特征进行再提取。P6是对P5进行的一次下采样，得到的形状为（？，16，16，256）。

所以图像必须是1024/16=64的倍数。

RPN和MRCNN使用特征图

        # Note that P6 is used in RPN, but not in the classifier heads.
        rpn_feature_maps = [P2, P3, P4, P5, P6]
        mrcnn_feature_maps = [P2, P3, P4, P5]

可以看出rpn使用p2到p6，而mrcnn只使用了p2到p5。

我认为是为了rpn多选一点框出来，防止某些目标被漏了。另外mrcnn不用主要是由于有mask的分割，你特征图太小了，分割的也不会很准。

最终这些特征图的形状为：