python cnn代码详解 keras_MASK_RCNN代码详解(2)-RPN部分

最新推荐文章于 2023-04-14 13:40:35 发布

weixin_39975868

最新推荐文章于 2023-04-14 13:40:35 发布

阅读量523

点赞数

文章标签： python cnn代码详解 keras

Mask RCNN是在Faster_RCNN基础上提出网络结构，主要用于目标检测和实例分割。主要思想是在Faster RCNN框架上扩展Mask分支进行像素分割。

阅读的源码是matterport/Mask_RCNN，由python3、keras和tensorflow构建完整套代码。

整个代码详解分为4部分，依次为：Basebone Network代码

Region Propasal Network(RPN)代码

Network Heads代码

Losses代码

整个MaskRCNN模型的构建代码在mrcnn/model.py文件中，可以详细浏览浏览。

此处介绍第二部分Region Propasal Network(RPN)代码。

1.Region Propasal Network(RPN)代码

# Anchors

if mode == "training":

anchors = self.get_anchors(config.IMAGE_SHAPE)

anchors = np.broadcast_to(anchors, (config.BATCH_SIZE,) + anchors.shape)

anchors = KL.Lambda(lambda x: tf.Variable(anchors), name="anchors")(input_image)

else:

anchors = input_anchors

#---------------------------------分割线-----------------------------

# RPN Model

rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE,

len(config.RPN_ANCHOR_RATIOS), config.TOP_DOWN_PYRAMID_SIZE)

# Loop through pyramid layers

layer_outputs = [] # list of lists

for p in rpn_feature_maps:

layer_outputs.append(rpn([p]))

# Concatenate layer outputs

# Convert from list of lists of level outputs to list of lists

# of outputs across levels.

# e.g. [[a1, b1, c1], [a2, b2, c2]] => [[a1, a2], [b1, b2], [c1, c2]]

output_names = ["rpn_class_logits", "rpn_class", "rpn_bbox"]

outputs = list(zip(*layer_outputs))

outputs = [KL.Concatenate(axis=1, name=n)(list(o))

for o, n in zip(outputs, output_names)]

rpn_class_logits, rpn_class, rpn_bbox = outputs

# Generate proposals

# Proposals are [batch, N, (y1, x1, y2, x2)] in normalized coordinates

# and zero padded.

proposal_count = config.POST_NMS_ROIS_TRAINING if mode == "training"\

else config.POST_NMS_ROIS_INFERENCE

rpn_rois = ProposalLayer(

proposal_count=proposal_count,

nms_threshold=config.RPN_NMS_THRESHOLD,

name="ROI",

config=config)([rpn_class, rpn_bbox, anchors])config类配置参数

函数类

2.Region Propasal Network(RPN)结构示意图RPN结构示意图

最左侧的anchors是生成锚框流程，中间的build_rpn_model->rpn_graph是构建RPN网络流程，最右侧的ProposalLayer是筛选ROIs的生成建议框流程。

因为RPN网络的输出结果中rpn_bbox预测的是锚框与真实box的偏移值，所以需要使用ProposalLayer对anchors的数量和位置进行筛选和精修。

2.1 anchors代码

# mrcnn/model.py

def get_anchors(self, image_shape):

...

a = utils.generate_pyramid_anchors(

self.config.RPN_ANCHOR_SCALES,

self.config.RPN_ANCHOR_RATIOS,

backbone_shapes,

self.config.BACKBONE_STRIDES,

self.config.RPN_ANCHOR_STRIDE)

...

#---------------------------------分割线-----------------------------

# mrcnn/utils.py

def generate_pyramid_anchors(scales, ratios, feature_shapes, feature_strides,

anchor_stride):

anchors = []

for i in range(len(scales)):

anchors.append(generate_anchors(scales[i], ratios, feature_shapes[i],

feature_strides[i], anchor_stride))

return np.concatenate(anchors, axis=0)

def generate_anchors(scales, ratios, shape, feature_stride, anchor_stride):

...

return boxes

函数内部调用了utils.py内的anchors流程图配置参数RPN_ANCHOR_SCALES是anchor尺寸，分别为 (32, 64, 128, 256, 512)，对应rpn_feature_maps的[P2, P3, P4, P5, P6]，分辨率依次为[256,128,64,32,16]，也就是说底层高分辨率特征去检测较小的目标，顶层低分辨率特征图用于去检测较大的目标。

RPN_ANCHOR_RATIOS是锚框的长宽比，对应每一种尺寸的锚框取[0.5, 1, 2]，3种长宽比

BACKBONE_STRIDES 是特征图的降采样倍数，取[4, 8, 16, 32, 64]

BACKBONE_SHAPE是特征图分辨率，为[16,32,64,128,256]

generate_anchors是具体的为每一特征层生成anchor的函数，generate_pyramid_anchors用于拼接不同scale的anchor，最终得到anchors的shape为[anchor_count, (y1, x1, y2, x2)]，此时计算的anchor_count = (256*256 + 128*128 + 64*64 + 32*32 + 16*16)*3 = 261888。数量如此多的锚框不可能全部用于预测，所以有了后续的proposallayer进行筛选。

2.2 build_rpn_model->rpn_graph代码

def build_rpn_model(anchor_stride, anchors_per_location, depth):

input_feature_map = KL.Input(shape=[None, None, depth],

name="input_rpn_feature_map")

outputs = rpn_graph(input_feature_map, anchors_per_location, anchor_stride)

return KM.Model([input_feature_map], outputs, name="rpn_model")

def rpn_graph(feature_map, anchors_per_location, anchor_stride):

shared = KL.Conv2D(512, (3, 3), padding='same', activation='relu',

strides=anchor_stride,

name='rpn_conv_shared')(feature_map)

# Anchor Score. [batch, height, width, anchors per location * 2].

x = KL.Conv2D(2 * anchors_per_location, (1, 1), padding='valid',

activation='linear', name='rpn_class_raw')(shared)

# Reshape to [batch, anchors, 2]

rpn_class_logits = KL.Lambda(

lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 2]))(x)

# Softmax on last dimension of BG/FG.

rpn_probs = KL.Activation(

"softmax", name="rpn_class_xxx")(rpn_class_logits)

x = KL.Conv2D(anchors_per_location * 4, (1, 1), padding="valid",

activation='linear', name='rpn_bbox_pred')(shared)

# Reshape to [batch, anchors, 4]

rpn_bbox = KL.Lambda(lambda t: tf.reshape(t, [tf.shape(t)[0], -1, 4]))(x)

return [rpn_class_logits, rpn_probs, rpn_bbox]

build_rpn_graph调用rpn_graph构建rpn网络，首先会对特征层进行3*3的卷积操作，再分出两个分支，一个用于分类，一个用于回归目标框。代码内的anchors_per_location=3，是因为会对该特征图的锚框尺寸取3种ratio。流程图及解释如下：rpn网络构建示意图rpn_feature_maps是用于rpn推导的特征图列表，取[P2, P3, P4, P5, P6]。这里会对列表中的每个特征层做3*3卷积得到共享特征层shared。

shared = KL.Conv2D(512, (3, 3), padding='same', activation='relu',strides=anchor_stride,

name='rpn_conv_shared')(feature_map)shared层一个分支用于分类， 2 * anchors_per_location意味着每个锚框都要预测其前景/背景概率。

x = KL.Conv2D(2 * anchors_per_location, (1, 1), padding='valid',

activation='linear', name='rpn_class_raw')(shared)shared层一个分支用于回归目标框， 4 * anchors_per_location意味着每个锚框都要预测其4个位置相关值概率。这里的回归框使用中点坐标表达，即[x, y, log(w), log(h)]。

x = KL.Conv2D(anchors_per_location * 4, (1, 1), padding="valid",

activation='linear', name='rpn_bbox_pred')(shared)最后输出内容为[rpn_class_logits, rpn_probs, rpn_bbox]，其中rpn_probs是预测概率，rpn_bbox是预测目标框偏移量

2.3 ProposalLayer代码

class ProposalLayer(KE.Layer):

def __init__(self, proposal_count, nms_threshold, config=None, **kwargs):

super(ProposalLayer, self).__init__(**kwargs)

self.config = config

self.proposal_count = proposal_count

self.nms_threshold = nms_threshold

def call(self, inputs):

# Box Scores. Use the foreground class confidence. [Batch, num_rois, 1]

scores = inputs[0][:, :, 1]

# Box deltas [batch, num_rois, 4]

deltas = inputs[1]

deltas = deltas * np.reshape(self.config.RPN_BBOX_STD_DEV, [1, 1, 4])

# Anchors

anchors = inputs[2]

# Improve performance by trimming to top anchors by score

# and doing the rest on the smaller subset.

pre_nms_limit = tf.minimum(6000, tf.shape(anchors)[1])

ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True,

name="top_anchors").indices

scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y),

self.config.IMAGES_PER_GPU)

deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y),

self.config.IMAGES_PER_GPU)

pre_nms_anchors = utils.batch_slice([anchors, ix], lambda a, x: tf.gather(a, x),

self.config.IMAGES_PER_GPU,

names=["pre_nms_anchors"])

# Apply deltas to anchors to get refined anchors.

# [batch, N, (y1, x1, y2, x2)]

boxes = utils.batch_slice([pre_nms_anchors, deltas],

lambda x, y: apply_box_deltas_graph(x, y),

self.config.IMAGES_PER_GPU,

names=["refined_anchors"])

# Clip to image boundaries. Since we're in normalized coordinates,

# clip to 0..1 range. [batch, N, (y1, x1, y2, x2)]

window = np.array([0, 0, 1, 1], dtype=np.float32)

boxes = utils.batch_slice(boxes,

lambda x: clip_boxes_graph(x, window),

self.config.IMAGES_PER_GPU,

names=["refined_anchors_clipped"])

def nms(boxes, scores):

indices = tf.image.non_max_suppression(

boxes, scores, self.proposal_count,

self.nms_threshold, name="rpn_non_max_suppression")

proposals = tf.gather(boxes, indices)

# Pad if needed

padding = tf.maximum(self.proposal_count - tf.shape(proposals)[0], 0)

proposals = tf.pad(proposals, [(0, padding), (0, 0)])

return proposals

proposals = utils.batch_slice([boxes, scores], nms,

self.config.IMAGES_PER_GPU)

return proposals

def compute_output_shape(self, input_shape):

return (None, self.proposal_count, 4)

proposallayer先根据score选择一部分，再对anchor执行边框精修并剔除超出边界的锚框，最后对筛选的锚框进行非极大值抑制，选出最终的锚框rpn_rois。流程图及解释如下：proposal流程图proposallayer需要3个输入参数依次为[rpn_class, rpn_bbox, anchors]，自定义层内部表示为[scores, deltas, anchors]，其中rpn_class和rpn_bbox为2.2预测分类和回归框偏移，2.1生成的anchors，各有261888个。在训练mrcnn时，一张图片不可能用这么多rois，因此要进一步筛选。

tf.nn.top_k用于根据rpn_class概率(也可以理解为rois得分)选取至多前6000个rois的索引，然后再根据索引选择出相应的属于top-6000的[scores, deltas, anchors]。tf.gather用于根据y索引选择x值。

pre_nms_limit = tf.minimum(6000, tf.shape(anchors)[1])

ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True,name="top_anchors").indices

scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y),

self.config.IMAGES_PER_GPU)

deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y),

self.config.IMAGES_PER_GPU)

pre_nms_anchors = utils.batch_slice([anchors, ix], lambda a, x: tf.gather(a, x),

self.config.IMAGES_PER_GPU, names=["pre_nms_anchors"])apply_box_deltas_graph用于根据deltas对anchors进行精修

boxes = utils.batch_slice([pre_nms_anchors, deltas], lambda x, y: apply_box_deltas_graph(x, y),

self.config.IMAGES_PER_GPU, names=["refined_anchors"])clip_boxes_graph用于将超出图片范围的anchors进行剔除，这里由于回归框是归一化在[0,1]区间内，所以通过clip进行限定。

window = np.array([0, 0, 1, 1], dtype=np.float32)

boxes = utils.batch_slice(boxes, lambda x: clip_boxes_graph(x, window),

self.config.IMAGES_PER_GPU, names=["refined_anchors_clipped"])nms执行非极大值抑制，根据IoU阈值选择出2000个rois，如果选择的rois不足2000，则用0进行pad填充。

def nms(boxes, scores):

indices = tf.image.non_max_suppression(boxes, scores, self.proposal_count,

self.nms_threshold, name="rpn_non_max_suppression")

proposals = tf.gather(boxes, indices)

# Pad if needed

padding = tf.maximum(self.proposal_count - tf.shape(proposals)[0], 0)

proposals = tf.pad(proposals, [(0, padding), (0, 0)])

return proposals

proposals = utils.batch_slice([boxes, scores], nms,self.config.IMAGES_PER_GPU)最终返回的proposals赋值给rpn_rois，作为rpn网络提供的建议区，注入后续FPN heads进行分类、目标框和像素分割的检测。

第二部分结束，欢迎批评指正～

weixin_39975868

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python cnn代码详解 keras_MASK_RCNN代码详解(2)-RPN部分

Mask RCNN是在Faster_RCNN基础上提出网络结构，主要用于目标检测和实例分割。主要思想是在Faster RCNN框架上扩展Mask分支进行像素分割。阅读的源码是matterport/Mask_RCNN，由python3、keras和tensorflow构建完整套代码。整个代码详解分为4部分，依次为：Basebone Network代码Region Propasal Network(R...
复制链接

扫一扫