DetectionLayer输入包含ROI,及对应的class,box修正信息,同时还有输入的image信息(image_meta),最终的输出都基于原图上的物体检测,原图的信息来自于image_meta。
# Detections
# output is [batch, num_detections, (y1, x1, y2, x2, class_id, score)] in
# normalized coordinates
detections = DetectionLayer(config, name="mrcnn_detection")(
[rpn_rois, mrcnn_class, mrcnn_bbox, input_image_meta])
input_image_meta项,它记录了每一张图片的原始信息,[batch, n]维矩阵,n是固定的,其生成于config.py文件中。
# Image meta data length
# See compose_image_meta() for details
self.IMAGE_META_SIZE = 1 + 3 + 3 + 4 + 1 + self.NUM_CLASSES
if mode == "square":
# Get new height and width
h, w = image.shape[:2]
top_pad = (max_dim - h) // 2
bottom_pad = max_dim - h - top_pad
left_pad = (max_dim - w) // 2
right_pad = max_dim - w - left_pad
padding = [(top_pad, bottom_pad), (left_pad, right_pad), (0, 0)]
image = np.pad(image, padding, mode='constant', constant_values=0)
window = (top_pad, left_pad, h + top_pad, w + left_pad)
我们将深蓝色的原图(不要求w等于h)通过填充的方式扩展为浅灰色的大图用于feed网络,"window"记录了以新图左上角为原点建立坐标系,原图的左上角点和右下角点的坐标,由于坐标系选取的是像素坐标,"window"记录的就是原始图片的大小,其蕴含了输入图片中真正有意义的位置信息。
用于解析image_meta结构的函数如下:
def parse_image_meta_graph(meta):
"""Parses a tensor that contains image attributes to its components.
See compose_image_meta() for more details.
meta: [batch, meta length] where meta length depends on NUM_CLASSES
Returns a dict of the parsed tensors.
"""
image_id = meta[:, 0]
original_image_shape = meta