目录
最终返回mrcnn_class_logits(1,200,81)、mrcnn_class_probs(1,200,81)和mrcnn_bbox(1,200,81,4)。
if mode == "training":
# Class ID mask to mark class IDs supported by the dataset the image
# came from.
active_class_ids = KL.Lambda(
lambda x: parse_image_meta_graph(x)["active_class_ids"]
)(input_image_meta)
if not config.USE_RPN_ROIS:
# Ignore predicted ROIs and use ROIs provided as an input.
input_rois = KL.Input(shape=[config.POST_NMS_ROIS_TRAINING, 4],
name="input_roi", dtype=np.int32)
# Normalize coordinates
target_rois = KL.Lambda(lambda x: norm_boxes_graph(
x, K.shape(input_image)[1:3]))(input_rois)
else:
target_rois = rpn_rois
# Generate detection targets
# Subsamples proposals and generates target outputs for training
# Note that proposal class IDs, gt_boxes, and gt_masks are zero
# padded. Equally, returned rois and targets are zero padded.
rois, target_class_ids, target_bbox, target_mask =\
DetectionTargetLayer(config, name="proposal_targets")([
target_rois, input_gt_class_ids, gt_boxes, input_gt_masks])
active_class_ids:在链接中,可以看出所有的类别都被激活,其形状为(1,81);
target_rois:在【MaskRCNN】源码系列三:train&test的RPN中,最终的输出就是rpn_rois,他是通过RPN选出来的前2000个框。
DetectionTargetLayer
class DetectionTargetLayer(KE.Layer):
def __init__(self, config, **kwargs):
super(DetectionTargetLayer, self).__init__(**kwargs)
self.config = config
def call(self, inputs):
proposals = inputs[0]
gt_class_ids = inputs[1]
gt_boxes = inputs[2]
gt_masks = inputs[3]
# Slice the batch and run a graph for each slice
# TODO: Rename target_bbox to target_deltas for clarity
names = ["rois", "target_class_ids", "target_bbox", "target_mask"]
outputs = utils.batch_slice(
[proposals, gt_class_ids, gt_boxes, gt_masks],
lambda w, x, y, z: detection_targets_graph(
w, x, y, z, self.config),
self.config.IMAGES_PER_GPU, names=names)
return outputs
def compute_output_shape(self, input_shape):
return [
(None, self.config.TRAIN_ROIS_PER_IMAGE, 4), # rois
(None, self.config.TRAIN_ROIS_PER_IMAGE), # class_ids
(None, self.config.TRAIN_ROIS_PER_IMAGE, 4), # deltas
(None, self.config.TRAIN_ROIS_PER_IMAGE, self.config.MASK_SHAPE[0],
self.config.MASK_SHAPE[1]) # masks
]
def compute_mask(self, inputs, mask=None):
return [None, None, None, None]
输入参数
target_rois:在【MaskRCNN】源码系列三:train&test的RPN中,最终的输出就是rpn_rois,他是通过RPN选出来的前2000个框,形状为(?,2000,4);
下面这3个变量都是待输入的变量,具体输入值可以查看【MaskRCNN】源码系列一:train数据处理三:
input_gt_class_ids:batch_gt_class_ids:形状为(1,100),这个100是指一张图片中最多出现实例的数量。
gt_boxes:batch_gt_boxes:形状为(1,100,4)
input_gt_masks:batch_gt_masks:形状为(1,56,56,100)
DetectionTargetLayer类主要就是call()函数中的detection_targets_graph函数。功能的实现主要靠detection_targets_graph。
detection_targets_graph
def detection_targets_graph(proposals, gt_class_ids, gt_boxes, gt_masks, config):
下面的变量要注意一点,第一个维度没有了,也就是batchsize这个维度,主要是由于batch_slice的操作导致的。
proposals:就是上面的target_rois
gt_class_ids:就是上面的input_gt_class_ids
gt_boxes:就是上面的gt_boxes
gt_masks:就是上面的input_gt_masks
# Assertions
asserts = [
tf.Assert(tf.greater(tf.shape(proposals)[0], 0), [proposals],
name="roi_assertion"),
]
with tf.control_dependencies(asserts):
proposals = tf.identity(proposals)
说实话,这段不知道干嘛的。。。
去0操作
# Remove zero padding
proposals, _ = trim_zeros_graph(proposals, name="trim_proposals")
gt_boxes, non_zeros = trim_zeros_graph(gt_boxes, name="trim_gt_boxes")
gt_class_ids = tf.boolean_mask(gt_class_ids, non_zeros,
name="trim_gt_class_ids")
gt_masks = tf.gather(gt_masks, tf.where(non_zeros)[:, 0], axis=2,
name="trim_gt_masks")
这段是删除之前为了固定形状补0的部分。得到的结果如下:
proposals:删除掉0的经过修正的前小于等于2000个预设框;
gt_boxes:一张图片中具体真实有多少个实例,(?,4)其中?小于等于100;
non_zeros:(100,)输入就是100,非0部分为True,0的部分为False;
gt_class_ids:(?)根据non_zeros,选取非0部分每一个实例的类别;
gt_masks:(56,56,?)根据non_zeros,选取非0部分每一个实例的mask。
trim_zeros_graph
def trim_zeros_graph(boxes, name='trim_zeros'): """Often boxes are represented with matrices of shape [N, 4] and are padded with zeros. This removes zero boxes. boxes: [N, 4] matrix of boxes. non_zeros: [N] a 1D boolean mask identifying the rows to keep """ non_zeros = tf.cast(tf.reduce_sum(tf.abs(boxes), axis=1), tf.bool) boxes = tf.boolean_mask(boxes, non_zeros, name=name) return boxes, non_zeros
有可能输入的形状为(?,4),表示坐标,取绝对值并在1这个维度进行求和,大于0的位置为True,等于0的位置为False,然后使用boolean_mask来选取为True位置的数据,得到的boxes形状为(?>新数量,4)。
处理特别拥挤的实例
# Handle COCO crowds
# A crowd box in COCO is a bounding box around several instances. Exclude
# them from training. A crowd box is given a negative class ID.
crowd_ix = tf.where(gt_class_ids < 0)[:, 0]
non_crowd_ix = tf.where(gt_class_ids > 0)[:, 0]
crowd_boxes = tf.gather(gt_boxes, crowd_ix)
gt_class_ids = tf.gather(gt_class_ids, non_crowd_ix)
gt_boxes = tf.gather(gt_boxes, non_crowd_ix)
gt_masks = tf.gather(gt_masks, non_crowd_ix, axis=2)
gt_class_ids小于0就是特别拥挤的,大于0就是单张图片中每一个真实的实例类别。每个的形状为:
crowd_ix:(?,)其中?表示拥挤实例的数量
non_crowd_ix:(?,)其中?表示正常实例的数量
crowd_boxes:(?,4)其中?表示拥挤实例的数量
gt_class_ids:(?,)其中?表示正常实例的数量
gt_boxes:(?,4)其中?表示正常实例的数量
gt_masks:(56,56,?)其中?表示正常实例的数量
# Compute overlaps with crowd boxes [proposals, crowd_boxes]
crowd_overlaps = overlaps_graph(proposals, crowd_boxes)
crowd_iou_max = tf.reduce_max(crowd_overlaps, axis=1)
no_crowd_bool = (crowd_iou_max < 0.001)
计算上面得到的proposals与拥挤实例crowd_boxes框进行重叠率,在拥挤实例中找到与单个proposals重叠度最大的值,如果该值小于0.001,那说明这部分符合条件的proposals可以被保留。
计算overlaps
# Compute overlaps matrix [proposals, gt_boxes]
overlaps = overlaps_graph(proposals, gt_boxes)
def overlaps_graph(boxes1, boxes2):
"""Computes IoU overlaps between two sets of boxes.
boxes1, boxes2: [N, (y1, x1, y2, x2)].
"""
# 1. Tile boxes2 and repeat boxes1. This allows us to compare
# every boxes1 against every boxes2 without loops.
# TF doesn't have an equivalent to np.repeat() so simulate it
# using tf.tile() and tf.reshape.
b1 = tf.reshape(tf.tile(tf.expand_dims(boxes1, 1),
[1, 1, tf.shape(boxes2)[0]]), [-1, 4]) # 这个形状是(N1*N2,4)这里表示N1中的第一个坐标被重复N2次,第二个坐标被重复N2次,一直到N1个坐标被重复N2次
b2 = tf.tile(boxes2, [tf.shape(boxes1)[0], 1]) # 这个形状是(N1*N2,4)这里表示N2个坐标被重复了N1次
# 2. Compute intersections
b1_y1, b1_x1, b1_y2, b1_x2 = tf.split(b1, 4, axis=1)
b2_y1, b2_x1, b2_y2, b2_x2 = tf.split(b2, 4, axis=1)
y1 = tf.maximum(b1_y1, b2_y1)
x1 = tf.maximum(b1_x1, b2_x1)
y2 = tf.minimum(b1_y2, b2_y2)
x2 = tf.minimum(b1_x2, b2_x2)
intersection = tf.maximum(x2 - x1, 0) * tf.maximum(y2 - y1, 0)
# 3. Compute unions
b1_area = (b1_y2 - b1_y1) * (b1_x2 - b1_x1)
b2_area = (b2_y2 - b2_y1) * (b2_x2 - b2_x1)
union = b1_area + b2_area - intersection
# 4. Compute IoU and reshape to [boxes1, boxes2]
iou = intersection / union
overlaps = tf.reshape(iou, [tf.shape(boxes1)[0], tf.shape(boxes2)[0]]) # roi_box与每个gt_box的得分
return overlaps
上面这段就是计算得到的小于等于2000个前景框与该张图片中每一个非拥挤实例的框的坐标的重叠率。
输入参数
proposals:删除掉0的经过修正的前小于等于2000个预设框,假设数量为N1;
gt_boxes:非拥挤实例的框的坐标,假设数量为N2。
返回值
overlaps:返回的是每一个roi_box也就是proposals,与gt_boxes之间的得分,所以形状为(N1,N2)。
候选正负ROIS
# Determine positive and negative ROIs
roi_iou_max = tf.reduce_max(overlaps, axis=1)
# 1. Positive ROIs are those with >= 0.5 IoU with a GT box
positive_roi_bool = (roi_iou_max >= 0.5)
positive_indices = tf.where(positive_roi_bool)[:, 0]
# 2. Negative ROIs are those with < 0.5 with every GT box. Skip crowds.
negative_indices = tf.where(tf.logical_and(roi_iou_max < 0.5, no_crowd_bool))[:, 0]
在overlaps中找到每一个proposals与gt_boxes的最大重叠率,当最大重叠率大于0.5时,就认为是正样本。当小于0.5,并且不是拥挤样本时就是负样本。
positive_indices:候选正样本的索引值;
negative_indices:候选负样本的索引值。
抽选部分正负ROIS参与训练
# Subsample ROIs. Aim for 33% positive
# Positive ROIs
positive_count = int(config.TRAIN_ROIS_PER_IMAGE *
config.ROI_POSITIVE_RATIO) #200*0.33
positive_indices = tf.random_shuffle(positive_indices)[:positive_count]
positive_count = tf.shape(positive_indices)[0]
# Negative ROIs. Add enough to maintain positive:negative ratio.
r = 1.0 / config.ROI_POSITIVE_RATIO
negative_count = tf.cast(r * tf.cast(positive_count, tf.float32), tf.int32) - positive_count
negative_indices = tf.random_shuffle(negative_indices)[:negative_count]
# Gather selected ROIs
positive_rois = tf.gather(proposals, positive_indices)
negative_rois = tf.gather(proposals, negative_indices)
一共有200个参与后续训练,从上面候选正样本中选取200*0.33个数据作为后续训练,这里负样本的数量为negative_count=(1/0.33)*(200*0.33)-(200*0.33)。
确定参与训练ROIs所对应的标签
# Assign positive ROIs to GT boxes.就是每一个正样本框对应的gt框的坐标以及类别是什么
positive_overlaps = tf.gather(overlaps, positive_indices)
roi_gt_box_assignment = tf.cond(
tf.greater(tf.shape(positive_overlaps)[1], 0),
true_fn = lambda: tf.argmax(positive_overlaps, axis=1),
false_fn = lambda: tf.cast(tf.constant([]),tf.int64)
)
roi_gt_boxes = tf.gather(gt_boxes, roi_gt_box_assignment)
roi_gt_class_ids = tf.gather(gt_class_ids, roi_gt_box_assignment)
根据训练正样本选取与gt_boxes的重叠率,然后判断这些训练正样本与哪个gt_box的重叠率最高,那就把这个gt_box的坐标和类别作为该训练正样本的标签。
计算训练正样本与gt_box之间的偏移值
# Compute bbox refinement for positive ROIs
deltas = utils.box_refinement_graph(positive_rois, roi_gt_boxes)
deltas /= config.BBOX_STD_DEV
def box_refinement_graph(box, gt_box):
"""Compute refinement needed to transform box to gt_box.
box and gt_box are [N, (y1, x1, y2, x2)]
"""
box = tf.cast(box, tf.float32)
gt_box = tf.cast(gt_box, tf.float32)
height = box[:, 2] - box[:, 0]
width = box[:, 3] - box[:, 1]
center_y = box[:, 0] + 0.5 * height
center_x = box[:, 1] + 0.5 * width
gt_height = gt_box[:, 2] - gt_box[:, 0]
gt_width = gt_box[:, 3] - gt_box[:, 1]
gt_center_y = gt_box[:, 0] + 0.5 * gt_height
gt_center_x = gt_box[:, 1] + 0.5 * gt_width
dy = (gt_center_y - center_y) / height
dx = (gt_center_x - center_x) / width
dh = tf.log(gt_height / height)
dw = tf.log(gt_width / width)
result = tf.stack([dy, dx, dh, dw], axis=1)
return result
计算训练正样本与gt_box之间的偏移值。形状为(?,4)。
为训练ROIS分配掩膜
# Assign positive ROIs to GT masks
# Permute masks to [N, height, width, 1]
transposed_masks = tf.expand_dims(tf.transpose(gt_masks, [2, 0, 1]), -1)
# Pick the right mask for each ROI
roi_masks = tf.gather(transposed_masks, roi_gt_box_assignment)
gt_masks的形状为(56,56,?)。结果变换得到transposed_mask,其形状为(?,56,56,1)。
这样再为每一个训练ROIS分配标记的掩膜,得到roi_masks。
# Compute mask targets
boxes = positive_rois
if config.USE_MINI_MASK:
# Transform ROI coordinates from normalized image space
# to normalized mini-mask space.
y1, x1, y2, x2 = tf.split(positive_rois, 4, axis=1)
gt_y1, gt_x1, gt_y2, gt_x2 = tf.split(roi_gt_boxes, 4, axis=1)
gt_h = gt_y2 - gt_y1
gt_w = gt_x2 - gt_x1
y1 = (y1 - gt_y1) / gt_h
x1 = (x1 - gt_x1) / gt_w
y2 = (y2 - gt_y1) / gt_h
x2 = (x2 - gt_x1) / gt_w
boxes = tf.concat([y1, x1, y2, x2], 1)
box_ids = tf.range(0, tf.shape(roi_masks)[0])
masks = tf.image.crop_and_resize(tf.cast(roi_masks, tf.float32), boxes,
box_ids,
config.MASK_SHAPE)
# Remove the extra dimension from masks.
masks = tf.squeeze(masks, axis=3)
# Threshold mask pixels at 0.5 to have GT masks be 0 or 1 to use with
# binary cross entropy loss.
masks = tf.round(masks)
如上图所示,训练ROIS与target之间有差别,并不能完美重叠,那么就需要将他们的交集也就是阴影区域取出来,然后放缩成后续可以使用的28*28的尺寸,再经过squeeze的操作,将多余维度删除,得到masks,其形状为(?,28,28)。
其中如果对tf.image.crop_and_resize不清楚,可以参考https://blog.csdn.net/u013066730/article/details/100583484。
由于resize会出现非0或1值,但标签要求必须是0或1,所以使用round函数,大于等于0.5为1,小于0.5为0。
规范输出的形状
# Append negative ROIs and pad bbox deltas and masks that
# are not used for negative ROIs with zeros.
rois = tf.concat([positive_rois, negative_rois], axis=0)
N = tf.shape(negative_rois)[0]
P = tf.maximum(config.TRAIN_ROIS_PER_IMAGE - tf.shape(rois)[0], 0)
rois = tf.pad(rois, [(0, P), (0, 0)])
roi_gt_boxes = tf.pad(roi_gt_boxes, [(0, N + P), (0, 0)])
roi_gt_class_ids = tf.pad(roi_gt_class_ids, [(0, N + P)])
deltas = tf.pad(deltas, [(0, N + P), (0, 0)])
masks = tf.pad(masks, [[0, N + P], (0, 0), (0, 0)])
return rois, roi_gt_class_ids, deltas, masks
rois的包含正负样本,但我们规定是200个,通过上面的计算有可能是少于200个的,所以为了规范形状,不足200,就补0补到200。
roi_gt_boxes,roi_gt_class_ids,deltas,masks都是只有这样本才有这样的标签,所以填补0的时候,是P+N,最终的个数都是200个。
整个DetectionTargetLayer最终得到了如下几个变量:
rois:(1,200,4)这里的200包含有3部分,正样本、负样本和填补的0;
target_class_ids:(1,200)这里的200包含有2部分,正样本和填补的0;
target_bbox:(1,200,4)这里的200包含有2部分,正样本和填补的0;
target_mask:(1,200,28,28)这里的200包含有2部分,正样本和填补的0。
fpn_classifier_graph
回到build函数种
# Network Heads
# TODO: verify that this handles zero padded ROIs
mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\
fpn_classifier_graph(rois, mrcnn_feature_maps, input_image_meta,
config.POOL_SIZE, config.NUM_CLASSES,
train_bn=config.TRAIN_BN,
fc_layers_size=config.FPN_CLASSIF_FC_LAYERS_SIZE)
def fpn_classifier_graph(rois, feature_maps, image_meta,
pool_size, num_classes, train_bn=True,
fc_layers_size=1024):
# ROI Pooling
# Shape: [batch, num_rois, POOL_SIZE, POOL_SIZE, channels]
x = PyramidROIAlign([pool_size, pool_size],
name="roi_align_classifier")([rois, image_meta] + feature_maps)
# Two 1024 FC layers (implemented with Conv2D for consistency)
x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (pool_size, pool_size), padding="valid"),
name="mrcnn_class_conv1")(x)
x = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn1')(x, training=train_bn)
x = KL.Activation('relu')(x)
x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (1, 1)),
name="mrcnn_class_conv2")(x)
x = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn2')(x, training=train_bn)
x = KL.Activation('relu')(x)
shared = KL.Lambda(lambda x: K.squeeze(K.squeeze(x, 3), 2),
name="pool_squeeze")(x)
# Classifier head
mrcnn_class_logits = KL.TimeDistributed(KL.Dense(num_classes),
name='mrcnn_class_logits')(shared)
mrcnn_probs = KL.TimeDistributed(KL.Activation("softmax"),
name="mrcnn_class")(mrcnn_class_logits)
# BBox head
# [batch, num_rois, NUM_CLASSES * (dy, dx, log(dh), log(dw))]
x = KL.TimeDistributed(KL.Dense(num_classes * 4, activation='linear'),
name='mrcnn_bbox_fc')(shared)
# Reshape to [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))]
s = K.int_shape(x)
mrcnn_bbox = KL.Reshape((s[1], num_classes, 4), name="mrcnn_bbox")(x)
return mrcnn_class_logits, mrcnn_probs, mrcnn_bbox
输入参数
rois:(1,200,4)这里的200包含有3部分,正样本、负样本和填补的0;
feature_maps:[P2, P3, P4, P5],这里请参考【MaskRCNN】源码系列二:train&test特征图;
image_meta:这里就是记录了原始图片被放缩处理的信息,https://blog.csdn.net/u013066730/article/details/102501128;
pool_size:7;
num_classes:81;
train_bn:False;
fc_layers_size:1024。
下面先介绍这个网络种比较关键的几个内容,再介绍网络整体结构
PyramidROIAlign
class PyramidROIAlign(KE.Layer):
def __init__(self, pool_shape, **kwargs):
super(PyramidROIAlign, self).__init__(**kwargs)
self.pool_shape = tuple(pool_shape)
下面这段代码是该层的使用方式
x = PyramidROIAlign([pool_size, pool_size],
name="roi_align_classifier")([rois, image_meta] + feature_maps)
初始化pool_size=7,然后再输入参数rois,image_meta,feature_maps。
关注主要的call函数
def call(self, inputs):
# Crop boxes [batch, num_boxes, (y1, x1, y2, x2)] in normalized coords
boxes = inputs[0]
# Image meta
# Holds details about the image. See compose_image_meta()
image_meta = inputs[1]
# Feature Maps. List of feature maps from different level of the
# feature pyramid. Each is [batch, height, width, channels]
feature_maps = inputs[2:]
# Assign each ROI to a level in the pyramid based on the ROI area.
y1, x1, y2, x2 = tf.split(boxes, 4, axis=2)
h = y2 - y1
w = x2 - x1
# Use shape of first image. Images in a batch must have the same size.
image_shape = parse_image_meta_graph(image_meta)['image_shape'][0]
# Equation 1 in the Feature Pyramid Networks paper. Account for
# the fact that our coordinates are normalized here.
# e.g. a 224x224 ROI (in pixels) maps to P4
image_area = tf.cast(image_shape[0] * image_shape[1], tf.float32)
roi_level = log2_graph(tf.sqrt(h * w) / (224.0 / tf.sqrt(image_area)))
roi_level = tf.minimum(5, tf.maximum(
2, 4 + tf.cast(tf.round(roi_level), tf.int32))) # shape is (1,?,1)
roi_level = tf.squeeze(roi_level, 2)
# Loop through levels and apply ROI pooling to each. P2 to P5.
pooled = []
box_to_level = []
for i, level in enumerate(range(2, 6)):
for i, level in enumerate(range(2, 6)):
# ix的形状为(?,2),表示每个batch中有多少个与特征图图等级相符的roi的索引,?表示有多少个,2表示有2维度的值
# 假设?=10,batch如果为3,那么这个ix的形状就为(10,2),在2列中第一列表示在哪个图片上,可以取0-2的值,第二列表示该图片上有几个符合特征图等级的roi的索引值。
ix = tf.where(tf.equal(roi_level, level))
# 选取符合条件的batch中的某张图片的某个roi,将这些都选出来
level_boxes = tf.gather_nd(boxes, ix)
# 这些roi都属于一个batch中的哪张图片
# Box indices for crop_and_resize.
box_indices = tf.cast(ix[:, 0], tf.int32)
# Stop gradient propogation to ROI proposals
level_boxes = tf.stop_gradient(level_boxes)
box_indices = tf.stop_gradient(box_indices)
# Crop and Resize
# Result: [batch * num_boxes, pool_height, pool_width, channels]
# 比如说batch为3,那么box_indices可能为[0,2,0,0,1,0,1,2,0,2,2,2,1],那么0表示一个batch中的第0张图片
# level_boxes的形状为(?,4),这个?与box_indices的长度相同,表示符合特征图level等级的一个batch中第n张图片中的一个roi框
# feature_map[i]表示对应的level的特征图
pooled.append(tf.image.crop_and_resize(
feature_maps[i], level_boxes, box_indices, self.pool_shape,
method="bilinear"))
# Pack pooled features into one tensor ,shape is (?,7,7,256)
pooled = tf.concat(pooled, axis=0)
# Pack box_to_level mapping into one array and add another
# column representing the order of pooled boxes
# 原来是box_to_level可能为[[0,2],[0,13],[0,21],[1,5],[2,8],[2,10]...],经过这下面的操作变为[[0,2,0],[0,13,1],[0,21,2],[1,5,3],[2,8,4],[2,10,5]...]
box_to_level = tf.concat(box_to_level, axis=0) # 4个(?,2)进行组合
box_range = tf.expand_dims(tf.range(tf.shape(box_to_level)[0]), 1)
box_to_level = tf.concat([tf.cast(box_to_level, tf.int32), box_range],
axis=1)
# Rearrange pooled features to match the order of the original boxes
# Sort box_to_level by batch then box index
# TF doesn't have a way to sort by two columns, so merge them and sort.
# sorting_tensor就是batch中的每个图片的索引拉开差距,第0张图片那就是自己,第1张图片需要+100000,第二张图片需要+200000,以此类推
sorting_tensor = box_to_level[:, 0] * 100000 + box_to_level[:, 1]
# 从小到大排列,这样就有了按照batch中每个图片的顺序先排序,再按照第n个box的顺序拍列
ix = tf.nn.top_k(sorting_tensor, k=tf.shape(
box_to_level)[0]).indices[::-1]
# 先将box_to_level按照该顺序排列
ix = tf.gather(box_to_level[:, 2], ix)
# 再将pooled按照box_to_level的顺序排列
pooled = tf.gather(pooled, ix)
# Re-add the batch dimension
# 这个shape就是boxes一个batch中有几张图片,每张图片有200个roi,后面的形状就是7*7*256
shape = tf.concat([tf.shape(boxes)[:2], tf.shape(pooled)[1:]], axis=0)
# 重新排形状,其具体形状为(b,200,7,7,256)
pooled = tf.reshape(pooled, shape)
return pooled
def compute_output_shape(self, input_shape):
return input_shape[0][:2] + self.pool_shape + (input_shape[2][-1], )
首先关注一个问题:目前获得的rois(经RPN网络的长和宽大多都不一样了)是通过FPN获得的(P2-P5),P2-P5默认设置anchor的也不一样,比如说(16,32,64,128,256),那生成的某个roi来自于哪个feature_map(P2-P5对应的featrue_map) ? 也就是说对应给定的roi,应该去哪个feature_map上做ROIAlign?因为之前生成anchor数目过多,没有记录其来自哪个feature_map,论文提出了一个近似的计算公式:(截图来自于FPN网络原论文)
然后在看代码就清晰多了,至于roialign,代码直接调用了tf.image.crop_and_resize函数,这里有个较好的roi_pooling博文:https://blog.deepsense.ai/region-of-interest-pooling-explained/
这里的roi_level = log2_graph(tf.sqrt(h * w) / (224.0 / tf.sqrt(image_area)))稍微做了改变,变成了。
TimeDistributed
具体请参考https://blog.csdn.net/u013066730/article/details/100737059。
在经过PyramidROIAlign操作后,其tensor形状为(?,200,7,7,256),所以使用正常的卷积似乎有问题,多了一个维度,但是用3为卷积也不对,这里我只是想同一个卷积对200个roi进行卷积。此时TimeDistributed就能实现,达到200所在的这个维度共享参数。
再回来看整个特征图是怎么变化的(下面将batch设置为1进行举例):
input_shape | output_shape | |
PyramidROIAlign | [rois, image_meta] + feature_maps | (1,200,7,7,256) |
TimeDistributed-Conv2D(k=7,i=256,o=1024) | (1,200,7,7,256) | (1,200,1,1,1024) |
TimeDistributed-BN | (1,200,1,1,1024) | (1,200,1,1,1024) |
Relu | (1,200,1,1,1024) | (1,200,1,1,1024) |
TimeDistributed-Conv2D(k=1,i=1024,o=1024) | (1,200,1,1,1024) | (1,200,1,1,1024) |
TimeDistributed-BN | (1,200,1,1,1024) | (1,200,1,1,1024) |
Relu | (1,200,1,1,1024) | (1,200,1,1,1024) |
squeeze | (1,200,1,1,1024) | shared(1,200,1024) |
TimeDistributed-Dense | shared(1,200,1024) | mrcnn_class_logits(1,200,81) |
TimeDistributed-Softmax | mrcnn_class_logits(1,200,81) | mrcnn_class_probs(1,200,81) |
TimeDistributed-Dense | shared(1,200,1024) | x(1,200,81*4) |
reshape | x(1,200,81*4) | mrcnn_bbox(1,200,81,4) |
最终返回mrcnn_class_logits(1,200,81)、mrcnn_class_probs(1,200,81)和mrcnn_bbox(1,200,81,4)。