目录
预设anchors
RPN定义在mrcnn/model.py的MaskRCNN()类中的build函数中。
# Anchors
if mode == "training":
anchors = self.get_anchors(config.IMAGE_SHAPE) # shape is [261888,4], 得到了所有尺寸,而且经过归一化后的图片
# Duplicate across the batch dimension because Keras requires it
# TODO: can this be optimized to avoid duplicating the anchors?
anchors = np.broadcast_to(anchors, (config.BATCH_SIZE,) + anchors.shape)
# A hack to get around Keras's bad support for constants
anchors = KL.Lambda(lambda x: tf.Variable(anchors), name="anchors")(input_image)
else:
anchors = input_anchors
我们这里是进行train和test的rpn介绍,这里我们可以看到,当为train时进入判断获取得到的anchors和不进入train中获取的anchors其实是一样的,都是调用了get_anchors这个函数。
这里我们介绍一下get_anchors函数。
get_anchor
def get_anchors(self, image_shape):
"""Returns anchor pyramid for the given image size."""
backbone_shapes = compute_backbone_shapes(self.config, image_shape)
# Cache anchors and reuse if image shape is the same
if not hasattr(self, "_anchor_cache"):
self._anchor_cache = {}
if not tuple(image_shape) in self._anchor_cache:
# Generate Anchors
a = utils.generate_pyramid_anchors(
self.config.RPN_ANCHOR_SCALES, # [32,64,128,256,512]
self.config.RPN_ANCHOR_RATIOS, # [0.5,1,2]
backbone_shapes, # [[256,256],[128,128],[64,64],[32,32],[16,16]]
self.config.BACKBONE_STRIDES, # [4,8,16,32,64]
self.config.RPN_ANCHOR_STRIDE) # 1
# Keep a copy of the latest anchors in pixel coordinates because
# it's used in inspect_model notebooks.
# TODO: Remove this after the notebook are refactored to not use it
self.anchors = a # shape is [261888,4]
# Normalize coordinates
self._anchor_cache[tuple(image_shape)] = utils.norm_boxes(a, image_shape[:2])
return self._anchor_cache[tuple(image_shape)]
这里通过形状和config进行计算backbone_shapes。从上一篇博客中,我们了解到了rpn使用了P2,P3,P4,P5,P6这5个特征图,那么正好与此时的backbone_shapes对应了起来,其具体参数已经写在了代码注释中。
接着将进行generate_pyramid_anchors,这里我就不自己介绍了,请参阅之前的博客。
最后再进行一次规范化,具体看norm_boxes:
def norm_boxes(boxes, shape):
"""Converts boxes from pixel coordinates to normalized coordinates. # 只要是normalize,就表明所有的坐标都进行过归一化了
boxes: [N, (y1, x1, y2, x2)] in pixel coordinates
shape: [..., (height, width)] in pixels
Note: In pixel coordinates (y2, x2) is outside the box. But in normalized
coordinates it's inside the box.
Returns:
[N, (y1, x1, y2, x2)] in normalized coordinates
"""
h, w = shape
scale = np.array([h - 1, w - 1, h - 1, w - 1])
shift = np.array([0, 0, 1, 1])
return np.divide((boxes - shift), scale).astype(np.float32)
但这个规范化并不是规范到0-1之间,而是相对于宽高的压缩。所以规范化值可能会渠道-0.04或者1.15等,也就是会略微超出0和1这个范围,毕竟生成的预设anchorbox就是会超出原图大小。
所以最终的预设anchors的返回值会在大概-0.2到1.25之间,形状为(261888,4)。
RPN模型
rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE, # 1
len(config.RPN_ANCHOR_RATIOS), config.TOP_DOWN_PYRAMID_SIZE) # 3, 256
build_rpn_model
通过build函数中的上述代码,进入到build_rpn_model中一探究竟。
def build_rpn_model(anchor_stride, anchors_per_location, depth): #1,3,256
input_feature_map = KL.Input(shape=[None, None, depth],
name="input_rpn_feature_map")
outputs = rpn_graph(input_feature_map, anchors_per_location, anchor_stride) # (?,?,?,256),3,1 |||||| [(?,?,2),(?,?,2),(?,?,4)]
return KM.Model([input_feature_map], outputs, name="rpn_model")
再进入到rpn_graph函数仔细探讨。
def rpn_graph(feature_map, anchors_per_location, anchor_stride): # (?,?,?,256),3,1
"""Builds the computation graph of Region Proposal Network.
feature_map: backbone features [batch, height, width, depth]
anchors_per_location: number of anchors per pixel in the feature map
anchor_stride: Controls the density of anchors. Typically 1 (anchors for
every pixel in the feature map), or 2 (every other pixel).
Returns:
rpn_class_logits: [batch, H * W * anchors_per_location, 2] Anchor classifier logits (before softmax)
rpn_probs: [batch, H * W * anchors_per_location, 2] Anchor classifier probabilities.
rpn_bbox: [batch, H * W * anchors_per_location, (dy, dx, log(dh), log(dw))] Deltas to be
applied to anchors.
"""
# TODO: check if stride of 2 causes alignment issues if the feature map
# is not even.
# Shared convolutional base of the RPN
shared = KL.Conv2D(512, (3, 3), padding='same', activation='relu',
strides=anchor_stride,
name='rpn_conv_shared')(feature_map)
# Anchor Score. [batch, height, width, anchors per location * 2].
x = KL.Conv2D(2 * anchors_per_location, (1, 1), padding='valid',
activation='linear', name='rpn_class_raw')(shared)
# Reshape to [batch, anchors, 2]
rpn_class_logits = KL.Lambda(
lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 2]))(x) #[batch, H * W * anchors_per_location, 2]
# Softmax on last dimension of BG/FG.
rpn_probs = KL.Activation(
"softmax", name="rpn_class_xxx")(rpn_class_logits)
# Bounding box refinement. [batch, H, W, anchors per location * depth]
# where depth is [x, y, log(w), log(h)]
x = KL.Conv2D(anchors_per_location * 4, (1, 1), padding="valid",
activation='linear', name='rpn_bbox_pred')(shared)
# Reshape to [batch, anchors, 4]
rpn_bbox = KL.Lambda(lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 4]))(x) #[batch, H * W * anchors_per_location, 4]
return [rpn_class_logits, rpn_probs, rpn_bbox]
先看输入参数:
feature_map:上篇博客得到的rpn_feature_maps = [P2, P3, P4, P5, P6]中的特种图,一个一个的被送进rpn_graph中
anchors_per_location:3
anchor_stride:1
这里以P2为例,所以feature_map为(1,256,256,256)。
从上图可以看出rpn_graph共返回rpn_class_logits、rpn_probs、rpn_bbox。P2的形状已经在上图中显示,其他P3、P4、P5、P6以此类推,将得到如下表格:
rpn_class_logits | rpn_probs | rpn_bbox | |
P2 | (1,256*256*3,2) | (1,256*256*3,2) | (1,256*256*3,4) |
P3 | (1,128*128*3,2) | (1,128*128*3,2) | (1,128*128*3,4) |
P4 | (1,64*64*3,2) | (1,64*64*3,2) | (1,64*64*3,4) |
P5 | (1,32*32*3,2) | (1,32*32*3,2) | (1,32*32*3,4) |
P6 | (1,16*16*3,2) | (1,16*16*3,2) | (1,16*16*3,4) |
当返回到build_rpn_model函数中时,不难看出进行了新模型的构建,输入就是特征图,输出就是对应的[rpn_class_logits,rpn_probs,rpn_bbox]。
再跳回到上一级函数,直接就到了build中的下面的代码中:
# 将特征图送到rpn网络中得到输出结果。
layer_outputs = [] # list of lists
for p in rpn_feature_maps:
layer_outputs.append(rpn([p]))
output_names = ["rpn_class_logits", "rpn_class", "rpn_bbox"]
outputs = list(zip(*layer_outputs)) # shape is [3, ?]第一个logits,第二个probs,第三个是4个坐标
outputs = [KL.Concatenate(axis=1, name=n)(list(o))
for o, n in zip(outputs, output_names)]
rpn_class_logits, rpn_class, rpn_bbox = outputs
这个就是直接将一个个特征图送入到网络中完成的,然后经过堆叠,使得最终的
rpn_class_logits:形状为(1,261888,2);
rpn_class:形状为(1,261888,2),经过了softmax;
rpn_bbox:形状为(1,261888,4)。
这里的261888=256*256*3+128*128*3+64*64*3+32*32*3+16*16*3。
ProposalLayer
从261888个anchorbox中,虽然上面对这些anchorbox做了修正,但是很多并不能被直接使用,需要进行一定的挑选。
# Generate proposals
# Proposals are [batch, N, (y1, x1, y2, x2)] in normalized coordinates
# and zero padded.
proposal_count = config.POST_NMS_ROIS_TRAINING if mode == "training"\
else config.POST_NMS_ROIS_INFERENCE
rpn_rois = ProposalLayer(
proposal_count=proposal_count,
nms_threshold=config.RPN_NMS_THRESHOLD,
name="ROI",
config=config)([rpn_class, rpn_bbox, anchors]) # shape is [batch, N=2000, (y1, x1, y2, x2)],经过nms,然后不够2000就补成2000,多于2000的就扔掉。另外不会出现超出图片的检测框,有clip
这里proposal_count被设定为2000。
接下来进入到ProposalLayer层。
class ProposalLayer(KE.Layer):
def __init__(self, proposal_count, nms_threshold, config=None, **kwargs):
初始化参数
proposal_count:2000
nms_threshold:0.7
config:config
name:"ROI"
接着该类就直接被调用了
def call(self, inputs): #[rpn_class, rpn_bbox, anchors]
输入参数为
inputs:[rpn_class, rpn_bbox, anchors]
我嫌弃一句句解释了,以后都代码块进行解释。
def call(self, inputs): #[rpn_class, rpn_bbox, anchors]
# Box Scores. Use the foreground class confidence. [Batch, num_rois, 1]
scores = inputs[0][:, :, 1] # softmax后前景的得分,形状为(1,261888,)
# Box deltas [batch, num_rois, 4]
deltas = inputs[1] #坐标变换所需参数
deltas = deltas * np.reshape(self.config.RPN_BBOX_STD_DEV, [1, 1, 4])
# Anchors
anchors = inputs[2] #anchorboxes
上面就是简单的变量赋值。这里乘以self.config.RPN_BBOX_STD_DEV可以查看https://blog.csdn.net/u013066730/article/details/102504951#build_rpn_targets博客中数据输入时做了除以的操作,这里时为了抵消其带来的影响。
# Improve performance by trimming to top anchors by score
# and doing the rest on the smaller subset.
#这里看看输入的anchors的数量和self.config.PRE_NMS_LIMIT=6000哪个小,选小的那个
pre_nms_limit = tf.minimum(self.config.PRE_NMS_LIMIT, tf.shape(anchors)[1])
ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True,
name="top_anchors").indices
scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y),
self.config.IMAGES_PER_GPU)
deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y),
self.config.IMAGES_PER_GPU)
pre_nms_anchors = utils.batch_slice([anchors, ix], lambda a, x: tf.gather(a, x),
self.config.IMAGES_PER_GPU,
names=["pre_nms_anchors"])
这里261888个anchorboxes大于6000,所以就只选择前6000个得分的索引值(概率值越大表明越是前景)。batch_slice见下面讲解,这里说说
scores是根据索引选取出来的前6000个判断是否为前景的得分,得分越高越可能是前景;
deltas根据索引选取出来的前6000个偏移转换值,该值是通过上面的build_rpn_model所得到,形状为(6000,4)。
pre_nms_anchors根据索引选取出来的预设anchorbox,这些anchorbox是最有可能为前景的框。
batch_slice
这里介绍以下batch_slice,该函数后面会多次用到,这里仅以[scores,ix]为例,scores的形状为(1,261888),ix的形状为(1,6000),lambda x, y: tf.gather(x, y)函数被输入,然后每张GPU被分配的batchsize。
def batch_slice(inputs, graph_fn, batch_size, names=None): if not isinstance(inputs, list): inputs = [inputs] outputs = [] for i in range(batch_size): inputs_slice = [x[i] for x in inputs] # 循环遍历inputs列表,然后从单个元素中去除第一行数据,在这个例子中就是取出第一个样本的所有得分以及前6000个得分的索引值。 output_slice = graph_fn(*inputs_slice) # 从样本中选取出对应索引的值 if not isinstance(output_slice, (tuple, list)): output_slice = [output_slice] outputs.append(output_slice) # Change outputs from a list of slices where each is # a list of outputs to a list of outputs and each has # a list of slices outputs = list(zip(*outputs)) if names is None: names = [None] * len(outputs) result = [tf.stack(o, axis=0, name=n) for o, n in zip(outputs, names)] if len(result) == 1: result = result[0] return result
将*inputs_slice就是将scores[i]和ix[i]按照batchsize的分开,一个一个的进行graph_fn操作,通过graph_fn得到计算结果保存在output_slice,然后再按照batchsize所在维度进行堆叠,得到最终结果result。
讲完batch_slice的作用,再继续往下走。
# Apply deltas to anchors to get refined anchors.
# [batch, N, (y1, x1, y2, x2)]
boxes = utils.batch_slice([pre_nms_anchors, deltas],
lambda x, y: apply_box_deltas_graph(x, y),
self.config.IMAGES_PER_GPU,
names=["refined_anchors"])
上面这段代码其实就行将build_rpn_model得到的结果与预设anchorbox进行结合,最终得到经过修正的anchorbox,我们称经过修正的anchorbox成为预测boxes。
涉及apply_box_deltas_graph(),那就介绍这个函数。
apply_box_deltas_graph
def apply_box_deltas_graph(boxes, deltas): """Applies the given deltas to the given boxes. boxes: [N, (y1, x1, y2, x2)] boxes to update deltas: [N, (dy, dx, log(dh), log(dw))] refinements to apply """ # Convert to y, x, h, w height = boxes[:, 2] - boxes[:, 0] width = boxes[:, 3] - boxes[:, 1] center_y = boxes[:, 0] + 0.5 * height center_x = boxes[:, 1] + 0.5 * width # Apply deltas center_y += deltas[:, 0] * height center_x += deltas[:, 1] * width height *= tf.exp(deltas[:, 2]) width *= tf.exp(deltas[:, 3]) # Convert back to y1, x1, y2, x2 y1 = center_y - 0.5 * height x1 = center_x - 0.5 * width y2 = y1 + height x2 = x1 + width result = tf.stack([y1, x1, y2, x2], axis=1, name="apply_box_deltas_out") return result
这里主要是要代入公式计算,具体公式如下:
其中Aw,Ah,Ax,Ay都是预设anchorbox的中心点和宽高,dx,dy,dw,dh都是预测得到的偏移量,预设anchorbox经过偏移量的修正,即可得到RPN所需要的最终的预测boxes。
# Clip to image boundaries. Since we're in normalized coordinates,
# clip to 0..1 range. [batch, N, (y1, x1, y2, x2)]
window = np.array([0, 0, 1, 1], dtype=np.float32)
boxes = utils.batch_slice(boxes,
lambda x: clip_boxes_graph(x, window),
self.config.IMAGES_PER_GPU,
names=["refined_anchors_clipped"])
上述代码主要是由于预测boxes的大小超过0-1这个范围,这显然不合理,检测框只能出现在我图像的内部,所以我们就强制让小于0的值变为0,大于1的值变为1,保证所有的预测boxes都在图像内。
clip_boxes_graph
def clip_boxes_graph(boxes, window): """ boxes: [N, (y1, x1, y2, x2)] window: [4] in the form y1, x1, y2, x2 """ # Split wy1, wx1, wy2, wx2 = tf.split(window, 4) y1, x1, y2, x2 = tf.split(boxes, 4, axis=1) # Clip y1 = tf.maximum(tf.minimum(y1, wy2), wy1) x1 = tf.maximum(tf.minimum(x1, wx2), wx1) y2 = tf.maximum(tf.minimum(y2, wy2), wy1) x2 = tf.maximum(tf.minimum(x2, wx2), wx1) clipped = tf.concat([y1, x1, y2, x2], axis=1, name="clipped_boxes") clipped.set_shape((clipped.shape[0], 4)) return clipped
这段代码就是让不在0-1范围内的框都强制变到0-1区间内。
NMS
上面该准备的内容已经准备好了,开始进行nms操作。
# Non-max suppression
def nms(boxes, scores):
indices = tf.image.non_max_suppression(
boxes, scores, self.proposal_count,
self.nms_threshold, name="rpn_non_max_suppression")
proposals = tf.gather(boxes, indices)
# Pad if needed
padding = tf.maximum(self.proposal_count - tf.shape(proposals)[0], 0)
proposals = tf.pad(proposals, [(0, padding), (0, 0)]) # 补0,上下左右,也就是下面补0
return proposals
proposals = utils.batch_slice([boxes, scores], nms,
self.config.IMAGES_PER_GPU) # 最终的形状应该为[batchsize, N=2000, 4]
boxes:预测boxes,其形状为(1,6000,4);
scores:根据得分大小选出的前6000个前景的概率值,该概率值与预测boxes一一对应,其形状为(1,6000);
self.proposal_count:2000,只选2000个前景框;
self.nms_threshold:0.7。
根据tf.image.non_max_suppression()得到最终保留下来的索引,然后根据索引选取对应位置的预测boxes,得到proposals。这个proposals形状可能为(?,1500,4),只有1500个符合要求,但是我们要求必须为(?,2000,4)的形状,所以只能将他补0。
最终返回的proposals就是rpn_rois。
到这里RPN基本就结束了,RPN的损失后面再讲。
我们简单做个总结
首先使用卷积神经网络输出预设框的前背景类别置信值和偏移值,然后根据前背景类别置信值选出前6000个预设框,将预设框与偏移值进行结合,对预设框进行修正得到预测boxes,将预测boxes再经过nms的操作,得到最终proposals。