代码地址:https://github.com/endernewton/tf-faster-rcnn
1、首先,用初始化卷积核(9*2和9*4个)对vgg16生成的feature map(512通道)做卷积,得到没有实际意义[1]的rpn_cls_score和rpn_bbox_pred,之后再根据相应的loss反向传播,更新卷积核。rpn_cls_score是判断框是前景/背景,rpn_bbox_pred是预测bounding box和ground truth之间的偏移量delta。利用softmax函数把rpn_cls_score归一化,得到框属于前景/背景的概率,使前景/背景的概率总和等于1。feature map上面每个特征点有9个框,每个框有相应的背景/前景概率,所以rpn_cls_prob的shape是(1,?,?,18)。rpn_cls_pred通过argmax比较前景和背景的概率,判断框属于前景/背景。
RPN和RCNN共享这部分参数。
# 基础CNN网络(VGG16,ZF等)的参数使用ImageNet预训练,其他layer的参数使用期望为0、标准差为0.01的高斯分布初始化[6]
def _region_proposal(self, net_conv, is_training, initializer):
rpn = slim.conv2d(net_conv, cfg.RPN_CHANNELS, [3, 3], trainable=is_training, weights_initializer=initializer,
scope="rpn_conv/3x3")
self._act_summaries.append(rpn)
rpn_cls_score = slim.conv2d(rpn, self._num_anchors * 2, [1, 1], trainable=is_training,
weights_initializer=initializer,
padding='VALID', activation_fn=None, scope='rpn_cls_score')
# change it so that the score has 2 as its channel size
rpn_cls_score_reshape = self._reshape_layer(rpn_cls_score, 2, 'rpn_cls_score_reshape')
rpn_cls_prob_reshape = self._softmax_layer(rpn_cls_score_reshape, "rpn_cls_prob_reshape")
# 判断框属于前景/背景
rpn_cls_pred = tf.argmax(tf.reshape(rpn_cls_score_reshape, [-1, 2]), axis=1, name="rpn_cls_pred")
rpn_cls_prob = self._reshape_layer(rpn_cls_prob_