Faster R-CNN代码分析
去年的时候,因为急着写论文。所以没有对Faster R-CNN代码继续做分析了。这两个星期我会继续把这个代码整体的做完。包括里面的每一个细节,作者这段代码的意义我都会说明白。首先看咱们的这个Faster R-CNN中最核心最核心的一句代码。
layers = self.net.create_architecture(sess, "TRAIN", self.imdb.num_classes, tag='default') #核心
该函数定义了模型的整个流程,主要可以分为四个大模块分别是:
(1)设置backbone网络(本项目采用的是VGG16)
net = self.build_head(is_training)
(2)定义RPN网络
rpn_cls_prob, rpn_bbox_pred, rpn_cls_score, rpn_cls_score_reshape = self.build_rpn(net, is_training, initializer)
(3)返回供最后做边框预测以及目标实际物体分类的那些ROI。当然这一个函数远远没有这么简单。它还有一些其他的重要功能,我会花很大的篇幅去讲这一层。
rois = self.build_proposals(is_training, rpn_cls_prob, rpn_bbox_pred, rpn_cls_score)
(4)最后就是咱们的输出预测层了。
cls_score, cls_prob, bbox_pred = self.build_predictions(net, rois, is_training, initializer, initializer_bbox)
下面我会分别来讲述咱们的四个层。由于第三个层涉及到一些比较复杂的内容,所以我会单独写一篇出来。本篇就主要记载咱们的build_head层,build_rpn层所做的一些事情。
build_head层:
def build_head(self, is_training):
# Main network
# Layer 1
net = slim.repeat(self._image, 2, slim.conv2d, 64, [3, 3], trainable=False, scope='conv1') #Layer1-Layer5都是VGG16的网络实现 self._image是网络的输入, 2表示slim.conv2d这个操作执行2次,每次输出的神经元个数都是64,卷积核是3*3
net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool1')
# Layer 2
net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], trainable=False, scope='conv2')
net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool2')
# Layer 3
net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], trainable=is_training, scope='conv3')
net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool3')
# Layer 4
net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], trainable=is_training, scope='conv4')
net = slim.max_pool2d(net, [2, 2], padding='SAME', scope='pool4')
# Layer 5
net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], trainable=is_training, scope='conv5')
# Append network to summaries
self._act_summaries.append(net)
# Append network as head layer
self._layers['head'] = net
return net
这其实很简单,就是用VGG16的前几层去提取图像的特征。返回提取后的特征图
build_rpn层:
def build_rpn(self, net, is_training, initializer):
# Build anchor component
self._anchor_component()
# Create RPN Layer
rpn = slim.conv2d(net, 512, [3, 3], trainable=is_training, weights_initializer=initializer, scope="rpn_conv/3x3")
self._act_summaries.append(rpn)
rpn_cls_score = slim.conv2d(rpn, self._num_anchors * 2, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_cls_score')#2N个
# Change it so that the score has 2 as its channel size
rpn_cls_score_reshape = self._reshape_layer(rpn_cls_score, 2, 'rpn_cls_score_reshape')
rpn_cls_prob_reshape = self._softmax_layer(rpn_cls_score_reshape, "rpn_cls_prob_reshape")
rpn_cls_prob = self._reshape_layer(rpn_cls_prob_reshape, self._num_anchors * 2, "rpn_cls_prob")
rpn_bbox_pred = slim.conv2d(rpn, self._num_anchors * 4, [1, 1], trainable=is_training, weights_initializer=initializer, padding='VALID', activation_fn=None, scope='rpn_bbox_pred')
return rpn_cls_prob, rpn_bbox_pred, rpn_cls_score, rpn_cls_score_reshape
这里就是创建咱们的RPN层了,那么最难理解的是_anchor_component函数了。它是去生成锚点的,我之前的博文有讲过所以这里就不在多赘述了。那么返回的是rpn_cls_score就是没有经过softmax激活的,对其reshape主要是为了后面计算损失函数。rpn_cls_prob就是经过咱们的softmax函数激活后的概率值。关于RPN的原理我也写过一篇文章感兴趣的也可以去看一下。