目录
特征图:是指通过CNN提取,被后续的RPN,RCNN使用的特征图。要求输入图像必须是64的倍数,具体原因请看下面的介绍。
从samples/coco/coco.py:
model = modellib.MaskRCNN(mode="training", config=config,
model_dir=args.logs)
到mrcnn/model.py:
def __init__(self, mode, config, model_dir):
"""
mode: Either "training" or "inference"
config: A Sub-class of the Config class
model_dir: Directory to save training logs and trained weights
"""
assert mode in ['training', 'inference']
self.mode = mode
self.config = config
self.model_dir = model_dir
self.set_log_dir()
self.keras_model = self.build(mode=mode, config=config)
再到mrcnn/model.py:
def build(self, mode, config):
再到build函数中的模型建立模块
if callable(config.BACKBONE):
_, C2, C3, C4, C5 = config.BACKBONE(input_image, stage5=True,
train_bn=config.TRAIN_BN)
else:
_, C2, C3, C4, C5 = resnet_graph(input_image, config.BACKBONE,
stage5=True, train_bn=config.TRAIN_BN)
config.BACKBONE=‘resnet101’,但是没有这个函数,只有resnet_graph这个函数,这里我不知道作者想表达什么意思,不管了。
正式进入到我们的模型构建了:
def resnet_graph(input_image, architecture, stage5=False, train_bn=True):
输入参数
input_image:(batchsize,1024,1024,3)
architecture:resnet101
stage5:True
train_bn:False,这里设置为False,后面在训练的时候会重新设置。
接下来我们看看到底有哪几个步骤
Stage1
# Stage 1
x = KL.ZeroPadding2D((3, 3))(input_image)
x = KL.Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=True)(x)
x = BatchNorm(name='bn_conv1')(x, training=train_bn)
x = KL.Activation('relu')(x)
C1 = x = KL.MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)
第一个?表示batchsize,特征图+通道数。stage1得到的结果为(?,256,256,64)
Stage2(C2)
# Stage 2
x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1), train_bn=train_bn)
x = identity_block(x, 3, [64, 64, 256], stage=2, block='b', train_bn=train_bn)
C2 = x = identity_block(x, 3, [64, 64, 256], stage=2, block='c', train_bn=train_bn)
上面这段代码包含了conv_block,identity_block,这里就先看看这2个函数到底干了什么事情。
conv_block
def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2, 2), use_bias=True, train_bn=True): """conv_block is the block that has a conv layer at shortcut # Arguments input_tensor: input tensor kernel_size: default 3, the kernel size of middle conv layer at main path filters: list of integers, the nb_filters of 3 conv layer at main path stage: integer, current stage label, used for generating layer names block: 'a','b'..., current block label, used for generating layer names use_bias: Boolean. To use or not use a bias in conv layers. train_bn: Boolean. Train or freeze Batch Norm layers Note that from stage 3, the first conv layer at main path is with subsample=(2,2) And the shortcut should have subsample=(2,2) as well """ nb_filter1, nb_filter2, nb_filter3 = filters conv_name_base = 'res' + str(stage) + block + '_branch' bn_name_base = 'bn' + str(stage) + block + '_branch' x = KL.Conv2D(nb_filter1, (1, 1), strides=strides, name=conv_name_base + '2a', use_bias=use_bias)(input_tensor) x = BatchNorm(name=bn_name_base + '2a')(x, training=train_bn) x = KL.Activation('relu')(x) x = KL.Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same', name=conv_name_base + '2b', use_bias=use_bias)(x) x = BatchNorm(name=bn_name_base + '2b')(x, training=train_bn) x = KL.Activation('relu')(x) x = KL.Conv2D(nb_filter3, (1, 1), name=conv_name_base + '2c', use_bias=use_bias)(x) x = BatchNorm(name=bn_name_base + '2c')(x, training=train_bn) shortcut = KL.Conv2D(nb_filter3, (1, 1), strides=strides, name=conv_name_base + '1', use_bias=use_bias)(input_tensor) shortcut = BatchNorm(name=bn_name_base + '1')(shortcut, training=train_bn) x = KL.Add()([x, shortcut]) x = KL.Activation('relu', name='res' + str(stage) + block + '_out')(x) return x
输入参数:
input_tensor:具体以实际情况而定,这里是(?,256,256,64)
kernel_size:3
filters:[64,64,256]
stage:2
block:'a'
strides:(1, 1)或者(2,2),这里以(1,1)举例
use_bias:True
train_bn:False
最终返回结果为(?,256,256,256)。
identity_block
def identity_block(input_tensor, kernel_size, filters, stage, block, use_bias=True, train_bn=True): """The identity_block is the block that has no conv layer at shortcut # Arguments input_tensor: input tensor kernel_size: default 3, the kernel size of middle conv layer at main path filters: list of integers, the nb_filters of 3 conv layer at main path stage: integer, current stage label, used for generating layer names block: 'a','b'..., current block label, used for generating layer names use_bias: Boolean. To use or not use a bias in conv layers. train_bn: Boolean. Train or freeze Batch Norm layers """ nb_filter1, nb_filter2, nb_filter3 = filters conv_name_base = 'res' + str(stage) + block + '_branch' bn_name_base = 'bn' + str(stage) + block + '_branch' x = KL.Conv2D(nb_filter1, (1, 1), name=conv_name_base + '2a', use_bias=use_bias)(input_tensor) x = BatchNorm(name=bn_name_base + '2a')(x, training=train_bn) x = KL.Activation('relu')(x) x = KL.Conv2D(nb_filter2, (kernel_size, kernel_size), padding='same', name=conv_name_base + '2b', use_bias=use_bias)(x) x = BatchNorm(name=bn_name_base + '2b')(x, training=train_bn) x = KL.Activation('relu')(x) x = KL.Conv2D(nb_filter3, (1, 1), name=conv_name_base + '2c', use_bias=use_bias)(x) x = BatchNorm(name=bn_name_base + '2c')(x, training=train_bn) x = KL.Add()([x, input_tensor]) x = KL.Activation('relu', name='res' + str(stage) + block + '_out')(x) return x
输入参数:
input_tensor:具体以实际情况而定,这里是(?,256,256,256)
kernel_size:3
filters:[64,64,256]
stage:2
block:'b'
use_bias:True
train_bn:False
该函数返回的形状为(?,256,256,256)。
这里stage2进行了conv_block,identity_block,identity_block。
stage | block | kernel_size | stride | filters | 输入尺寸 | 输出尺寸 | |
conv_block | 2 | 'a' | 3 | 1 | [64,64,256] | (?,256,256,64) | (?,256,256,256) |
identity_block | 2 | 'b' | 3 | 1 | [64,64,256] | (?,256,256,256) | (?,256,256,256) |
identity_block | 2 | 'c' | 3 | 1 | [64,64,256] | (?,256,256,256) | (?,256,256,256) |
最终x和C2的输出形状都为(?,256,256,256)。
Stage3(C3)
# Stage 3
x = conv_block(x, 3, [128, 128, 512], stage=3, block='a', train_bn=train_bn)
x = identity_block(x, 3, [128, 128, 512], stage=3, block='b', train_bn=train_bn)
x = identity_block(x, 3, [128, 128, 512], stage=3, block='c', train_bn=train_bn)
C3 = x = identity_block(x, 3, [128, 128, 512], stage=3, block='d', train_bn=train_bn)
共进行了conv_block、identity_block、identity_block、identity_block。
stage | block | kernel_size | stride | filters | 输入尺寸 | 输出尺寸 | |
conv_block | 3 | 'a' | 3 | 2 | [128,128,512] | (?,256,256,256) | (?,128,128,512) |
identity_block | 3 | 'b' | 3 | 1 | [128,128,512] | (?,128,128,512) | (?,128,128,512) |
identity_block | 3 | 'c' | 3 | 1 | [128,128,512] | (?,128,128,512) | (?,128,128,512) |
identity_block | 3 | 'd' | 3 | 1 | [128,128,512] | (?,128,128,512) | (?,128,128,512) |
最终x和C3的输出形状都为(?,128,128,512)。
Stage4(C4)
# Stage 4
x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a', train_bn=train_bn)
block_count = {"resnet50": 5, "resnet101": 22}[architecture]
for i in range(block_count):
x = identity_block(x, 3, [256, 256, 1024], stage=4, block=chr(98 + i), train_bn=train_bn)
C4 = x
stage | block | kernel_size | stride | filters | 输入尺寸 | 输出尺寸 | |
conv_block | 4 | 'a' | 3 | 2 | [256,256,1024] | (?,128,128,512) | (?,64,64,1024) |
identity_block(重复22次,i就表示第几次) | 4 | 98+i | 3 | 1 | [256,256,1024] | (?,64,64,1024) | (?,64,64,1024) |
最终x和C4的输出形状为(?,64,64,1024)。
Stage5(C5)
x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a', train_bn=train_bn)
x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b', train_bn=train_bn)
C5 = x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c', train_bn=train_bn)
stage | block | kernel_size | stride | filters | 输入尺寸 | 输出尺寸 | |
conv_block | 5 | 'a' | 3 | 2 | [512,512,2048] | (?,64,64,1024) | (?,32,32,2048) |
identity_block | 5 | 'b' | 3 | 1 | [512,512,2048] | (?,32,32,2048) | (?,32,32,2048) |
identity_block | 5 | 'c' | 3 | 1 | [512,512,2048] | (?,32,32,2048) | (?,32,32,2048) |
最终的x和C5的输出形状为(?,32,32,2048)。
return [C1, C2, C3, C4, C5]返回C1,C2,C3,C4,C5的列表。
P5
P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c5p5')(C5)
由于C5的形状为(?,32,32,2048),要求输出的通道数为256,所以P5的形状为(?,32,32,256)。
P4
P4 = KL.Add(name="fpn_p4add")([
KL.UpSampling2D(size=(2, 2), name="fpn_p5upsampled")(P5),
KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c4p4')(C4)])
P5进行2倍的上采样,其形状变为(?,64,64,256)。C4经过1*1的卷积变换通道数量,其形状为(?,64,64,256)。再将这2个变换过的张量进行求和,得到P4,其形状为(?,64,64,256)。
P3
P3 = KL.Add(name="fpn_p3add")([
KL.UpSampling2D(size=(2, 2), name="fpn_p4upsampled")(P4),
KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c3p3')(C3)])
将P4进行2倍的上采样,其形状变为(?,128,128,256)。C3经过1*1的卷积变换通道数量,其形状为(?,128,128,256)。再将这2个变换过的张量进行求和,得到P3,其形状为(?,128,128,256)。
P2
P2 = KL.Add(name="fpn_p2add")([
KL.UpSampling2D(size=(2, 2), name="fpn_p3upsampled")(P3),
KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c2p2')(C2)])
将P3进行2倍的上采样,其形状变为(?,256,256,256)。C2经过1*1的卷积变换通道数量,其形状为(?,256,256,256)。再将这2个变换过的张量进行求和,得到P2,其形状为(?,256,256,256)。
最终的P2、P3、P4、P5、P6
# Attach 3x3 conv to all P layers to get the final feature maps.
P2 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p2")(P2)
P3 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p3")(P3)
P4 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p4")(P4)
P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p5")(P5)
# P6 is used for the 5th anchor scale in RPN. Generated by
# subsampling from P5 with stride of 2.
P6 = KL.MaxPooling2D(pool_size=(1, 1), strides=2, name="fpn_p6")(P5)
P2、P3、P4、P5都是经过一个(3,3)的卷积,大小不变,特征进行再提取。P6是对P5进行的一次下采样,得到的形状为(?,16,16,256)。
所以图像必须是1024/16=64的倍数。
RPN和MRCNN使用特征图
# Note that P6 is used in RPN, but not in the classifier heads.
rpn_feature_maps = [P2, P3, P4, P5, P6]
mrcnn_feature_maps = [P2, P3, P4, P5]
可以看出rpn使用p2到p6,而mrcnn只使用了p2到p5。
我认为是为了rpn多选一点框出来,防止某些目标被漏了。另外mrcnn不用主要是由于有mask的分割,你特征图太小了,分割的也不会很准。
最终这些特征图的形状为:
P2 | (1,256,256,256) |
P3 | (1,128,128,256) |
P4 | (1,64,64,256) |
P5 | (1,32,32,256) |
P6 | (1,16,16,256) |