matterport/Mask rcnn
- model.py是网络主要构建的文件
- utils.py中的anchor产生函数部分,主要是涉及函数:
RPN部分
scales:(32, 64, 128, 256, 512)
ratios:(0.5, 1, 2)
feature_shapes:[[256, 256], [128, 128], [64, 64], [32, 32], [16, 16]]
feature_strides:[4, 8, 16, 32, 64]
anchor_stride:1
在generate_pyramid_anchor()中对generate_anchors()循环调用,scales, feature_shapes, feature_strides是一一对应来进入generate_anchors()函数。
input image的大小是1024x1024,那么经过4x下采样,得到(256, 256),同样,经过8x,得到(128, 128),这就是feature_strides和feature_shapes的对应关系。scales表示一个在各个FPN level上相同大小的anchor所产生的对应于input image的anchor的是不一样的。在(256, 256)产生的是32x32的,而在(128, 128)时,因为anchor大小不变,而感受野变大,变大2x,所以产生的是64x64的,以此类推。
generate_anchors()函数:当输入为generate_anchors(32, [0.5, 1, 2], [256, 256], 4, 1)时,在256x256大小的feature map上,计算可知,在feature map上的anchor大小是8×8的,映射到input image是32x32大小,其中的shifts_y与shifts_x就是对feature map 256×256上的element按照步长为anchor_stride来进行移动窗口,在此处设置anchor_stide为1,那么将会有256*256个,因为有三个ratios,所以一共有256×256×3个anchor,且通过操作 * feature_stride映射回到input image上。最后由中心坐标和长宽来求box的左上与右下坐标。
def generate_anchors(scales, ratios, shape, feature_stride, anchor_stride):
"""
scales: 1D array of anchor sizes in pixels. Example: [32, 64, 128]
ratios: 1D array of anchor ratios of width/height. Example: [0.5, 1, 2]
shape: [height, width] spatial shape of the feature map over which
to generate anchors.
feature_stride: Stride of the feature map relative to the image in pixels.
anchor_stride: Stride of anchors on the feature map. For example, if the
value is 2 then generate anchors for every other feature map pixel.
"""
# Get all combinations of scales and ratios
# eg 32 和 [0.5, 1, 2]
scales, ratios = np.meshgrid(np.array(scales), np.array(ratios))
scales = scales.flatten() # array([32, 32, 32])
ratios = ratios.flatten() # array([ 0.5, 1. , 2. ])
# Enumerate heights and widths from scales and ratios
heights = scales / np.sqrt(ratios) # array([ 45.254834, 32., 22.627417])
widths = scales * np.sqrt(ratios) # array([ 22.627417, 32. , 45.254834])
# Enumerate shifts in feature space,eg anchor_stride = 1 , featrue_stride = 4
# here, shape = (256, 256)
# shift_y.shape = (256, ), shift_x.shape = (256, )
shifts_y = np.arange(0, shape[0], anchor_stride) * feature_stride
shifts_x = np.arange(0, shape[1], anchor_stride) * feature_stride
# shifts_x.shape = (256, 256)--->[[0, 1, ...., 256], [0, 1, ...., 256], ...] * 4
# shifts_y.shape = (256, 256)--->[[0, 0, ...., 0], [1, 1, ...., 1], ....] * 4
shifts_x, shifts_y = np.meshgrid(shifts_x, shifts_y)
# Enumerate combinations of shifts, widths, and heights
# box_widths.shape = (256*256, 3)----> 3 is [22.627417, 32. , 45.254834]
# box_centers_x.shape = (256*256, 3)--->256*256 is [0, 1, ...255, 0, 1, ...255,,...]
box_widths, box_centers_x = np.meshgrid(widths, shifts_x)
box_heights, box_centers_y = np.meshgrid(heights, shifts_y)
# Reshape to get a list of (y, x) and a list of (h, w)
# (256*256*3, 2)
box_centers = np.stack([box_centers_y, box_centers_x], axis=2).reshape([-1, 2])
box_sizes = np.stack([box_heights, box_widths], axis=2).reshape([-1, 2])
# Convert to corner coordinates (y1, x1, y2, x2)
boxes = np.concatenate([box_centers - 0.5 * box_sizes,
box_centers + 0.5 * box_sizes], axis=1)
# if generate_anchors(32, [0.5, 1, 2], [256, 256], 4, 1), return boxes.shape=(196608, 4)
return boxes
def generate_pyramid_anchors(scales, ratios, feature_shapes, feature_strides,
anchor_stride):
"""Generate anchors at different levels of a feature pyramid. Each scale
is associated with a level of the pyramid, but each ratio is used in
all levels of the pyramid.
Returns:
anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array. Sorted
with the same order of the given scales. So, anchors of scale[0] come
first, then anchors of scale[1], and so on.
scales (32, 64, 128, 256, 512)
ratios = [0.5, 1, 2]
here, feature_shapes*feature_strides = [[1024, 1024], [1024, 1024], ...], input image is 1024x1024
feature_shapes =
[[256 256]
[128 128]
[ 64 64]
[ 32 32]
[ 16 16]]
feature_strides = [4, 8, 16, 32, 64]
anchor_stride = 1
"""
# Anchors
# [anchor_count, (y1, x1, y2, x2)]
anchors = []
for i in range(len(scales)):
anchors.append(generate_anchors(scales[i], ratios, feature_shapes[i],
feature_strides[i], anchor_stride))
return np.concatenate(anchors, axis=0)
model.py中的Class ProposalLayer,实现了网络中的Proposal层,是在rpn和rcnn的中间部分,这一层的输入是rpn_probs和rpn_bbox,代码中有标注,output的shape为(None, self.proposal_count, 4),可以看出是候选框的数量,4为坐标,proposalLayer层是对所有的经过rpn网络得到的proposal进行处理,从fg prob的得分选出top_k的候选框,再使用非极大值抑制进一步减少proposal数量。其中有个细节是利用rpn预测的rpn_bbox的(dy, dx, log(dh), log(dw))来对选出来的anchor原坐标(y1, x1, y2 ,x2)进行位置偏移的精修,rpn_bbox的预测是偏移量,处理的函数是:apply_box_deltas_graph(anchors, deltas)函数
class ProposalLayer(KE.Layer):
"""Receives anchor scores and selects a subset to pass as proposals
to the second stage. Filtering is done based on anchor scores and
non-max suppression to remove overlaps. It also applies bounding
box refinment detals to anchors.
Inputs:
rpn_probs: [batch, anchors, (bg prob, fg prob)]
rpn_bbox: [batch, anchors, (dy, dx, log(dh), log(dw))]
Returns:
Proposals in normalized coordinates [batch, rois, (y1, x1, y2, x2)]
"""
def __init__(self, proposal_count, nms_threshold, anchors,
config=None, **kwargs):
"""
proposal_count = 2000 or 1000 below, nms_threshold=0.7
anchors: [N, (y1, x1, y2, x2)] anchors defined in image coordinates
"""
super(ProposalLayer, self).__init__(**kwargs)
self.config = config
self.proposal_count = proposal_count
self.nms_threshold = nms_threshold
self.anchors = anchors.astype(np.float32)
def call(self, inputs):
# inputs is [rpn_class, rpn_bbox]
# Box Scores. Use the foreground class confidence. [Batch, num_rois, 1]
scores = inputs[0][:, :, 1]
# Box deltas [batch, num_rois, 4]
deltas = inputs[1]
# RPN_BBOX_STD_DEV = np.array([0.08, 0.08, 0.17, 0.17])
# RPN_BBOX_STD_MEANS = np.array([0.02, 0.02, 0.01, 0.02])
deltas = (deltas + np.reshape(self.config.RPN_BBOX_STD_MEANS, [1, 1, 4])) * np.reshape(self.config.RPN_BBOX_STD_DEV, [1, 1, 4])
# Base anchors
anchors = self.anchors
# Improve performance by trimming to top anchors by score
# and doing the rest on the smaller subset.
# self.anchors generated by utils.generate_pyramid_anchors
# anchor.shape = [anchor_num, 4]
pre_nms_limit = min(6000, self.anchors.shape[0])
ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True,
name="top_anchors").indices
scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y),
self.config.IMAGES_PER_GPU)
deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y),
self.config.IMAGES_PER_GPU)
anchors = utils.batch_slice(ix, lambda x: tf.gather(anchors, x),
self.config.IMAGES_PER_GPU,
names=["pre_nms_anchors"])
# Apply deltas to anchors to get refined anchors.
# [batch, N, (y1, x1, y2, x2)]
boxes = utils.batch_slice([anchors, deltas],
lambda x, y: apply_box_deltas_graph(x, y),
self.config.IMAGES_PER_GPU,