强势源码理解RPN区域推荐网络

最新推荐文章于 2024-02-12 16:08:15 发布

GodWriter

最新推荐文章于 2024-02-12 16:08:15 发布

阅读量401

点赞数 2

分类专栏：深度学习文章标签： Faster-RCNN RPN 源码理解

本文链接：https://blog.csdn.net/godwriter/article/details/105958195

版权

深度学习专栏收录该内容

21 篇文章 3 订阅

订阅专栏

1. Anchor Generation Layer

对于生成anchors的源码理解主要来源于两个代码

RBG大神的caffe源码：https://github.com/rbgirshick/py-faster-rcnn
Github上复现的pytorch源码：https://github.com/chenyuntc/simple-faster-rcnn-pytorch

由于两种方法生成anchors的技巧不同，故分开讨论，并主要以RBG大神的代码为主，讲解anchors的生成原理与生成技巧。

1.1 Caffe源码

首先，解释一下，重要的参数
- base_size=16，由于原图经过卷积池化后得到的特征图是原图的 $\frac{1}{16}$ ，故用于采样anchor的特征图上的一个cell就相当于原图的 $16 \times 16$ 区域。
- ratios=[0.5, 1, 2]，固定anchor面积下的长宽比，即 $\quad 1:1 \quad 2:1]$
- scales=[8, 16, 32]，即将anchors放大的倍数，具体在哪里用到会在后面详细解释
其次，我们根据RBG大神的源码走一遍anchors生成的流程
- ```
def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
                     scales=2**np.arange(3, 6)):
    """
    Generate anchor (reference) windows by enumerating aspect ratios X
    scales wrt a reference (0, 0, 15, 15) window.
    """

    base_anchor = np.array([1, 1, base_size, base_size]) - 1
    ratio_anchors = _ratio_enum(base_anchor, ratios)
    anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
                         for i in xrange(ratio_anchors.shape[0])])
    return anchors
```
  - generate_anchors() 函数是一切的开端，首先定义了base_anchor，由于图像的坐标以左上角为原点且值为(0, 0)，故base_anchor的坐标(xmin, ymin, xmax, ymax)为(0, 0, 15, 15)。
  - 其次，调用_ratio_enum()函数如下
- ```
def _ratio_enum(anchor, ratios):
    """
    Enumerate a set of anchors for each aspect ratio wrt an anchor.
    """

    w, h, x_ctr, y_ctr = _whctrs(anchor)
    size = w * h
    size_ratios = size / ratios
    ws = np.round(np.sqrt(size_ratios))
    hs = np.round(ws * ratios)
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
    return anchors
```
  - 为了计算w, h, x_ctr, y_ctr，又调用了_whctrs()函数，如下所示
- ```
def _whctrs(anchor):
    """
    Return width, height, x center, and y center for an anchor (window).
    """

    w = anchor[2] - anchor[0] + 1
    h = anchor[3] - anchor[1] + 1
    x_ctr = anchor[0] + 0.5 * (w - 1)
    y_ctr = anchor[1] + 0.5 * (h - 1)
    return w, h, x_ctr, y_ctr
```
  - _whctrs()函数的功能就是传入参数为（左上角x，左上角y，右上角x，右上角y），将其转换为（宽，高，中心坐标x，中心坐标y）
- 让我们回到_ratio_enum()函数
  - 得到base_anchor的（宽，高，中心坐标x，中心坐标y），经过计算值为（16, 16, 7.5, 7.5）
  - size = w x h = 16 x 16 = 256
  - size_ratios = $\frac{256}{[0.5 \quad 1 \quad 2]}$ = $[512, 256, 128]$
  - 对size_ratios开根号，再四舍五入，得到 ws = [23, 16, 11]
  - ws和ratios相乘就得到了 hs = [12, 16, 22]
  - ws和hs其实是相同面积下，anchor不同长宽比条件下，得到的长和宽。但由于四舍五入的缘故，ws x hs的面积值不一定相等
  - 得到上面的变量值后，又调用了_mkanchors()函数返回计算后的anchors，函数如下
- ```
def _mkanchors(ws, hs, x_ctr, y_ctr):
    """
    Given a vector of widths (ws) and heights (hs) around a center
    (x_ctr, y_ctr), output a set of anchors (windows).
    """

    ws = ws[:, np.newaxis]
    hs = hs[:, np.newaxis]
    anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
                         y_ctr - 0.5 * (hs - 1),
                         x_ctr + 0.5 * (ws - 1),
                         y_ctr + 0.5 * (hs - 1)))
    return anchors
```
  - 根据上面的代码，会得到如下的计算公式
    
    $\frac{1}{2}\left[\begin{matrix} 22 \\ 15 \\ 10 \end{matrix}\right] = \left[\begin{matrix} -3.5\\ 0\\ 2.5\end{matrix}\right]$
    
    $\frac{1}{2}\left[\begin{matrix} 12\\ 16\\ 22\end{matrix}\right] = \left[\begin{matrix} 1.5\\ 0\\ -3\end{matrix}\right]$
    
    $\frac{1}{2}\left[\begin{matrix} 22 \\ 15 \\ 10 \end{matrix}\right] = \left[\begin{matrix} 18.5\\ 15\\ 12.5\end{matrix}\right]$
    
    $\frac{1}{2}\left[\begin{matrix} 12\\ 16\\ 22\end{matrix}\right] = \left[\begin{matrix} 13\\ 15\\ 18\end{matrix}\right]$
  - 最后anchors的值为 $\left[\begin{matrix} -3.5 & 1.5 & 18.5 & 13.5\\ 0 & 0 & 15 & 15\\ 2.5 & -3 & 12.5 & 18\end{matrix}\right]$
  - 这里得到的是，面积都为256下，以（7.5， 7.5）为中心坐标的，不同长宽比例下的anchor坐标。根据坐标的计算公式，可以发现，都是以7.5为中心坐标减去一半的长或宽，那么得到的是新的（左上角x，左上角y，右上角x，右上角y）形式的坐标值。为什么坐标会是负数，因为左上角坐标超出了图片范围，故为负数。
- 得到以上anchors后，我们直接返回到generate_anchors()函数
  - 通过一系列函数的调用，我们得到了ratio_anchors的值，即 $\left[\begin{matrix} -3.5 & 1.5 & 18.5 & 13.5\\ 0 & 0 & 15 & 15\\ 2.5 & -3 & 12.5 & 18\end{matrix}\right]$
  - 最后一步，就是调用_scale_enum()函数，得到不同scale下，不同长宽比例的anchors。目前的scale为[8, 16, 32]，对于每一个scale都要调用_scale_enum()函数；传入不同长宽比、以(7.5, 7.5)为中心坐标的anchors（即ratio_anchors的每一行），每次返回3组变换尺度后的anchors，故最后会有9组anchors。_scale_enum()函数如下
- ```
def _scale_enum(anchor, scales):
    """
    Enumerate a set of anchors for each scale wrt an anchor.
    """

    w, h, x_ctr, y_ctr = _whctrs(anchor)
    ws = w * scales
    hs = h * scales
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
    return anchors
```
  - 我们以 $\quad 1.5 \quad 18.5 \quad 13.5]$ 为例
  - 调用_whctrs()函数，得到中心坐标表示，w, h, x_ctr, y_ctr = $\quad 12 \quad 7.5\quad 7.5]$
  - $\times \left[\begin{matrix} 8\\ 16\\ 32\end{matrix}\right] = \left[\begin{matrix} 184\\ 368\\ 736\end{matrix}\right]$ ，其实是宽为23的情况下，放大宽的值
  - $\times \left[\begin{matrix} 8\\ 16\\ 32\end{matrix}\right] = \left[\begin{matrix} 96\\ 192\\ 384\end{matrix}\right]$ ，其实是长为12的情况下，放大长的值
  - 由于中心坐标都是(7.5, 7.5)不变，但宽和高的值变了，所以新得到的anchors坐标需要再次调用_mkanchors()对坐标进行调整。在新的长和宽下，仍然以(7.5, 7.5)为中心坐标。
  - 最后计算得到的anchors坐标为 $\left[\begin{matrix} -83 & -39 & 100 & 56\\ -175 & -87 & 192 & 104\\ -359 & -183 & 376 & 200\end{matrix}\right]$
至此，RBG大神生成Anchors的方法就介绍完毕

1.2 Pytorch源码

Pytorch版本就不详细解释了，直接上代码，简单易懂

def generate_anchor_base(base_size=16, ratios=[0.5, 1, 2],
                         anchor_scales=[8, 16, 32]):
    """
    Returns:
        ~numpy.ndarray:
        An array of shape :math:`(R, 4)`.
        Each element is a set of coordinates of a bounding box.
        The second axis corresponds to
        :math:`(y_{min}, x_{min}, y_{max}, x_{max})` of a bounding box.
    """
    py = base_size / 2.
    px = base_size / 2.

    anchor_base = np.zeros((len(ratios) * len(anchor_scales), 4),
                           dtype=np.float32)
    for i in six.moves.range(len(ratios)):
        for j in six.moves.range(len(anchor_scales)):
            h = base_size * anchor_scales[j] * np.sqrt(ratios[i])
            w = base_size * anchor_scales[j] * np.sqrt(1. / ratios[i])

            index = i * len(anchor_scales) + j
            anchor_base[index, 0] = py - h / 2.
            anchor_base[index, 1] = px - w / 2.
            anchor_base[index, 2] = py + h / 2.
            anchor_base[index, 3] = px + w / 2.
    return anchor_base

参数和caffee一致，不同点在于，计算anchor_base的方式
这里的anchor_base没有-1
调用了两个循环，即遍历9次，每次得到一个anchors的坐标
计算的公式很奇怪，为何对ratios开根号，应该是有奇怪的转换公式的
最后，是直接求anchor_base的每一个坐标，以中心坐标为基准，计算(ymin, xmin, ymax, xmax)

本文为作者原创，转载需注明出处！

GodWriter

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
强势源码理解RPN区域推荐网络

1. Anchor Generation Layer对于生成anchors的源码理解主要来源于两个代码RBG大神的caffe源码：https://github.com/rbgirshick/py-faster-rcnnGithub上复现的pytorch源码：https://github.com/chenyuntc/simple-faster-rcnn-pytorch由于两种方法生成an...
复制链接

扫一扫