呕心沥血之作——RPN代码剖析

最新推荐文章于 2024-05-07 14:40:17 发布

哪咔吗

最新推荐文章于 2024-05-07 14:40:17 发布

阅读量2.1k

点赞数 4

分类专栏： RPN 文章标签： RPN 区域提案网络目标检测物体识别

本文链接：https://blog.csdn.net/weixin_37340613/article/details/83473211

版权

RPN 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Author:哪咔吗
Date:2018-10-28
Code source:https://github.com/endernewton/tf-faster-rcnn

概述

RPN(Region-based Proposal Network，区域提案网络)是在Faster R-CNN论文中提出的一种区域提案方法。它最大的特点就是利用了特征提取网络提取出的特征块，用于快速生成区域提案。我在详细理解Faster R-CNN代码的时候，发现RPN部分是最为复杂，最为隐晦的。因此做此专题，下定决心将RPN部分的代码进行彻底地剖析。
tensorflow在建立graph的时候因为没有真正地feed数据，所以在用pycharm debug的时候不能看到张量的值，顶多只能看到张量的shape，其中很多shape还是类似这样的shape=(1,?,?,4)。当然我们能够仔细地看每一步操作，然后理解每一个张量的含义，但是亲身经历告诉我这是一个非常痛苦的过程。
多亏了tfdbg这个神器，让我能够查看feed数据之后graph的运行状况，也就能够查看所有计算节点的张量。当然这一前提是在建立graph的时候有很好的名称上下文结构。这是通过 tf.variable_scope这一方法实现的。有关tfdbg的使用可以看我的另一篇博文——tensorflow tfdbg使用笔记。
下面这段代码就是RPN了，我把它分成三个部分，分别是：

生成anchors
从步骤1中生成的anchors中提取一个批次的anchors
根据步骤2中提取的anchors，从前面特征提取网络生成的特征块中提取特征向量，也就是ROI Pooling

with tf.variable_scope('hjf_rpn', 'hjf_rpn'):  # self._scope, self._scope
    # 1、生成anchors，shape=(?,4)
    self._anchor_component()
    # 2、RPN(从anchors中选取一定数量（256）的anchors，并完善这些锚的属性，如：前景背景标签、对应的gt矩形框信息等)
    rois = self._region_proposal(net_conv, training, initializer)
    # 3、ROI pooling
    if cfg.POOLING_MODE == 'crop':
        pool5 = self._crop_pool_layer(net_conv, rois, "pool5")
    else:
        raise NotImplementedError

1生成anchors

先上这一部分的全部代码，后面会给出每个步骤的详细解释，包括输出的张量结构和具体的数值。

def _anchor_component(self):
    with tf.variable_scope('anchor_component') as scope:
        # just to get the shape right
        height = tf.to_int32(tf.ceil(self._im_info[0] / np.float32(self._feat_stride[0])))   # 600/16=37.5=38
        width = tf.to_int32(tf.ceil(self._im_info[1] / np.float32(self._feat_stride[0])))    # 800/16=50
        if cfg.USE_E2E_TF:
            shift_x = tf.range(width) * self._feat_stride  # [0,16,32,48,64,80,96...784]   shape=[50]
            shift_y = tf.range(height) * self._feat_stride  # [0,16,32,48,64,80,96...592]  shape=[38]
            shift_x, shift_y = tf.meshgrid(shift_x, shift_y)  # shape=(38,50)
            sx = tf.reshape(shift_x, shape=(-1,))  # 把矩阵拉成一维向量 shape=(1900)
            sy = tf.reshape(shift_y, shape=(-1,))  # 把矩阵拉成一维向量 shape=(1900)
            shifts = tf.transpose(tf.stack([sx, sy, sx, sy]))  # shape=（1900,4) 此时的尺寸是按照800*600
            K = tf.multiply(width, height)  # 1900
            shifts = tf.transpose(tf.reshape(shifts, shape=[1, K, 4]), perm=(1, 0, 2))  # reshape to (1900,1,4)

            anchors = generate_anchors(ratios=np.array(self._anchor_ratios), scales=np.array(self._anchor_scales))  # 9个anchors
            A = anchors.shape[0]
            # 将anchors转换为tf格式
            anchor_constant = tf.constant(anchors.reshape((1, A, 4)), dtype=tf.int32)  # shape=(1,9,4)

            # TODO ??????????
            length = K * A  # 196*4=784
            anchors_tf = tf.reshape(tf.add(anchor_constant, shifts), shape=(length, 4))  # shape=(17100,4)
            anchors=tf.cast(anchors_tf, dtype=tf.float32)
            anchor_length=length

        else:
            anchors, anchor_length = tf.py_func(generate_anchors_pre,
                                                [height, width,
                                                 self._feat_stride, self._anchor_scales, self._anchor_ratios],
                                                [tf.float32, tf.int32], name="generate_anchors")
        anchors.set_shape([None, 4])
        anchor_length.set_shape([])
        self._anchors = anchors
        self._anchor_length = anchor_length

好了，先看下面这部分，这就是计算出特征块的宽和高。原图是500375的尺寸，在进入网络前尺寸被调整成800600（后面我们提到的“原图”都是指经过尺寸调整后的800*600的图像）。然后本文中使用的特征提取网络是RES101网络的前3个block，特征块的尺寸缩小为输入图像的1/16（后面我们在使用由特征提取网络生成的特征块时，统一简称其为“特征块”）。因此这边计算出的height=600/16=37.5=38（有个to_int操作）,width=800/16=50。

with tf.variable_scope('anchor_component') as scope:
    # just to get the shape right
    height = tf.to_int32(tf.ceil(self._im_info[0] / np.float32(self._feat_stride[0])))   # 600/16=37.5=38
    width = tf.to_int32(tf.ceil(self._im_info[1] / np.float32(self._feat_stride[0])))    # 800/16=50

接着，我们要知道一些锚生成的基本原则。锚的数量是特征块的宽×高×anchor_scales（通常，anchor_scales=9），那么在这个案例中要生成的锚的总数就是50389=17100个。也就是说我们会在特征块上的每个特征点上生成9个锚。下面这段代码，已经将特征块上的特征点对应到了原图上的坐标点，得出的shifts是一个shape=(1900,1,4)的矩阵。如：
在这里插入图片描述
我们应该这样理解这个矩阵，把它当成是特征块上的特征点（50*38=1900）到原图像素的一个映射，我们将以这1900个点为中心，分别为他们生成对应的9个锚。

if cfg.USE_E2E_TF:
     shift_x = tf.range(width) * self._feat_stride  # [0,16,32,48,64,80,96...784]   shape=[50]
     shift_y = tf.range(height) * self._feat_stride  # [0,16,32,48,64,80,96...592]  shape=[38]
     shift_x, shift_y = tf.meshgrid(shift_x, shift_y)  # shape=(38,50)
     sx = tf.reshape(shift_x, shape=(-1,))  # 把矩阵拉成一维向量 shape=(1900)
     sy = tf.reshape(shift_y, shape=(-1,))  # 把矩阵拉成一维向量 shape=(1900)
     shifts = tf.transpose(tf.stack([sx, sy, sx, sy]))  # shape=（1900,4) 此时的尺寸是按照800*600
     K = tf.multiply(width, height)  # 1900
     shifts = tf.transpose(tf.reshape(shifts, shape=[1, K, 4]), perm=(1, 0, 2))  # reshape to (1900,1,4)

我们要知道，锚的尺寸是由anchor_scales和anchor_ratios共同确定的。在本例中anchor_ratios=[0.5,1,2],anchor_scales=[8,16,32]。其中anchor_ratios表示锚框的宽高比，anchor_scales表示的是锚框的尺寸。看到这里可能会奇怪，[8,16,32]是锚的尺寸？其实这边的尺寸是针对特征块而言的，也就是说这个尺寸其实对应了原图的[128,256,512]。前面提到每个特征点都要生成9个锚，也就是说每个尺寸生成3个不同宽高比的锚。我们以尺寸=256为例，那么生成的3个锚应该是这样的：（可能与实际情况有点差别，这是由于取整的方式不同造成的，这里只是举例说明它的生成原则，实际上存在细微的差距并不会对检测结果有什么影响）

ratio=0.5 -> width=362,height=181
ratio=1 -> width=256,height=256
ratio=2 -> width=181,height=362

下列代码中最后得到的张量anchor_constant的值如下
[[ -84. -40. 99. 55.]
[-176. -88. 191. 103.]
[-360. -184. 375. 199.]
[ -56. -56. 71. 71.]
[-120. -120. 135. 135.]
[-248. -248. 263. 263.]
[ -36. -80. 51. 95.]
[ -80. -168. 95. 183.]
[-168. -344. 183. 359.]]
如果我们给出一个原图中的点（上面的步骤已经给出了），以这个点为中心，以上面这9个矩形框为相对值（注意理解相对值的概念），那么就可以得到围绕着这个点的9个锚了。当然，有些锚会超出原图尺寸。

anchors = generate_anchors(ratios=np.array(self._anchor_ratios), scales=np.array(self._anchor_scales))  # 9个anchors
A = anchors.shape[0]
# 将anchors转换为tf格式
anchor_constant = tf.constant(anchors.reshape((1, A, 4)), dtype=tf.int32)  # shape=(1,9,4)

length = K * A  # 1900*9=17100
anchors_tf = tf.reshape(tf.add(anchor_constant, shifts), shape=(length, 4))  # shape=(17100,4)
anchors=tf.cast(anchors_tf, dtype=tf.float32)

哪咔吗

关注

4
点赞
踩
11

收藏

觉得还不错? 一键收藏
6
评论
呕心沥血之作——RPN代码剖析

Author:哪咔吗Date:2018-10-28Code source:https://github.com/endernewton/tf-faster-rcnn目录概述1生成anchors概述 RPN(Region-based Proposal Network，区域提案网络)是在Faster R-CNN论文中提出的一种区域提案方法。它最大的特点就是利用了特征提取网络提取出的特征...
复制链接

扫一扫