一、FasterRcnn的RoiPooling
RoI Pooling层则负责收集proposal,并计算出proposal feature maps,送入后续网络。
①.Rol Pooling的输入
1、原始的feature maps
2、RPN输出的proposal boxes(大小各不相同)
②. Rol Pooling的作用
传统的CNN,当网络训练好后输入和输出的图像尺寸必须是固定值。
解决方法两种
1、从图像中crop一部分传入网络
2、将图像warp成需要的大小后传入网络
二、RoIAlign简单介绍
Rol Pooling在目标检测问题可以很好应用,但是运用于分割问题会存在Misalignment(非线性) 的问题,即两次量化会是的特征图对应位置不准确,虽然特征图上可能差的不多,但是映射到原始图就会差很多。因此,引入RoIAlign。
大致思路:
1、采用双线性插值的方法解决获得浮点数坐标点上的特征值。
2、对于每个小区域,平均分成4份,再取其4个中心点位置进行Pooling操作。
三、源码解读
def fpn_classifier_graph(rois, feature_maps,
image_shape, pool_size, num_classes):
# ROI Pooling
# 进行ROIAlign,这里选取多层特征图。所有的都成为固定的大小
x = PyramidROIAlign([pool_size, pool_size], image_shape,
name="roi_align_classifier")([rois] + feature_maps)
# 进行两层全连接
x = KL.TimeDistributed(KL.Conv2D(1024, (pool_size, pool_size), padding="valid"),
name="mrcnn_class_conv1")(x)
x = KL.TimeDistributed(BatchNorm(axis=3), name='mrcnn_class_bn1')(x)
x = KL.Activation('relu')(x)
x = KL.TimeDistributed(KL.Conv2D(1024, (1, 1)),
name="mrcnn_class_conv2")(x)
x = KL.TimeDistributed(BatchNorm(axis=3),
name='mrcnn_class_bn2')(x)
x = KL.Activation('relu')(x)
#组合特征
shared = KL.Lambda(lambda x: K.squeeze(K.squeeze(x, 3), 2),
name="pool_squeeze")(x)
# 全连接层分类器
mrcnn_class_logits = KL.TimeDistributed(KL.Dense(num_classes),
name='mrcnn_class_logits')(shared)
mrcnn_probs = KL.TimeDistributed(KL.Activation("softmax"),
name="mrcnn_class")(mrcnn_class_logits)
x = KL.TimeDistributed(KL.Dense(num_classes * 4, activation='linear'),
name='mrcnn_bbox_fc')(shared)
# Reshape结果
s = K.int_shape(x)
mrcnn_bbox = KL.Reshape((s[1], num_classes, 4), name="mrcnn_bbox")(x)
#得到每个roi预测的类别和编译量等指标值
return mrcnn_class_logits, mrcnn_probs, mrcnn_bbox
四、Mask分支
def build_fpn_mask_graph(rois, feature_maps,
image_shape, pool_size, num_classes):
#14*14
x = PyramidROIAlign([pool_size, pool_size], image_shape,
name="roi_align_mask")([rois] + feature_maps)
# 3*3卷积
x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),
name="mrcnn_mask_conv1")(x)
x = KL.TimeDistributed(BatchNorm(axis=3),
name='mrcnn_mask_bn1')(x)
#激活
x = KL.Activation('relu')(x)
# 3*3卷积
x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),
name="mrcnn_mask_conv2")(x)
x = KL.TimeDistributed(BatchNorm(axis=3),
name='mrcnn_mask_bn2')(x)
x = KL.Activation('relu')(x)
# 3*3卷积
x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),
name="mrcnn_mask_conv3")(x)
x = KL.TimeDistributed(BatchNorm(axis=3),
name='mrcnn_mask_bn3')(x)
x = KL.Activation('relu')(x)
# 3*3卷积
x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),
name="mrcnn_mask_conv4")(x)
x = KL.TimeDistributed(BatchNorm(axis=3),
name='mrcnn_mask_bn4')(x)
x = KL.Activation('relu')(x)
#反卷积
x = KL.TimeDistributed(KL.Conv2DTranspose(256, (2, 2), strides=2, activation="relu"),
name="mrcnn_mask_deconv")(x)
#加上sigmoid值
x = KL.TimeDistributed(KL.Conv2D(num_classes, (1, 1), strides=1, activation="sigmoid"),
name="mrcnn_mask")(x)
return x