https://github.com/matterport/Mask_RCNN
主要就是model.py,其他都是辅助模块。
本次先从class MaskRCNN()类的build建模函数讲起
- image 在特征提取网络中,进行6次特征缩小操作C1,C2,C3,C4,C5其中C1为4倍率缩小,其他为2倍率缩小,total 64倍缩小,所以网络要求输入的图像的尺寸为64的整数倍,默认输入为1024。
#image的size必须是64的倍数,因为特征图缩小最多为64倍。
# Image size must be dividable by 2 multiple times
h, w = config.IMAGE_SHAPE[:2]
if h / 2**6 != int(h / 2**6) or w / 2**6 != int(w / 2**6):
raise Exception("Image size must be dividable by 2 at least 6 times "
"to avoid fractions when downscaling and upscaling."
"For example, use 256, 320, 384, 448, 512, ... etc. ")
- 如果训练,需要定义GT的一些相关输入,包含RPN的GT(match,bbox
)及Detection GT(classid,box,mask)。分别用于计算评价RPN Loss与classcify loss,mask loss
if mode == "training":
#如果是训练,需要GT的一些输入,包含class,box,masks同时要对box坐标进行norm化
# RPN GT
input_rpn_match = KL.Input(
shape=[None, 1], name="input_rpn_match", dtype=tf.int32)
input_rpn_bbox = KL.Input(
shape=[None, 4], name="input_rpn_bbox", dtype=tf.float32)
# Detection GT (class IDs, bounding boxes, and masks)
# 1. GT Class IDs (zero padded)
input_gt_class_ids = KL.Input(
shape=[None], name="input_gt_class_ids", dtype=tf.int32)
# 2. GT Boxes in pixels (zero padded)
# [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in image coordinates
input_gt_boxes = KL.Input(
shape=[None, 4], name="input_gt_boxes", dtype=tf.float32)
# Normalize coordinates
gt_boxes = KL.Lambda(lambda x: norm_boxes_graph(
x, K.shape(input_image)[1:3]))(input_gt_boxes)
# 3. GT Masks (zero padded)
# [batch, height, width, MAX_GT_INSTANCES]
if config.USE_MINI_MASK:
input_gt_masks = KL.Input(
shape=[config.MINI_MASK_SHAPE[0],
config.MINI_MASK_SHAPE[1], None],
name="input_gt_masks", dtype=bool)
else:
input_gt_masks = KL.Input(
shape=[config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1], None],
name="input_gt_masks", dtype=bool)
elif mode == "inference"