Faster RCNN 中的Anchor

马鹤宁

已于 2023-03-21 10:16:01 修改

阅读量2.1k

点赞数 7

于 2021-03-03 21:53:53 首次发布

本文链接：https://blog.csdn.net/weixin_42111770/article/details/114337335

版权

机器学习和深度学习之旅专栏收录该内容

84 篇文章 35 订阅

订阅专栏

Faster RCNN 中的Anchor

文章目录

Faster RCNN 中的Anchor

Anchor 计算

在py-faster-rcnn/lib/rpn/generate_anchors.py文件中，定义了如何生成Anchor，代码如下：

import numpy as np
def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
                     scales=2**np.arange(3, 6)):
    """
    生成原图像(0, 0, 15, 15) 区域的Anchor，ratio为[0.5, 1, 2]，scale为[8, 16, 32]
    """
    # base_anchor = [0, 0, 15, 15]
    base_anchor = np.array([1, 1, base_size, base_size]) - 1
    # 生成ratios下的anchors
    ratio_anchors = _ratio_enum(base_anchor, ratios)
    # 对已进行过ratios的anchors进行scales
    anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
                         for i in xrange(ratio_anchors.shape[0])])
    return anchors

def _whctrs(anchor):
    """
    返回一个anchor的宽w，高h和中心点位置(x_ctr, y_ctr)
    """

    w = anchor[2] - anchor[0] + 1
    h = anchor[3] - anchor[1] + 1
    x_ctr = anchor[0] + 0.5 * (w - 1)
    y_ctr = anchor[1] + 0.5 * (h - 1)
    return w, h, x_ctr, y_ctr

def _mkanchors(ws, hs, x_ctr, y_ctr):
    """
    给定中心点(x_ctr, y_ctr)的宽(ws)和高(hs)向量，输出一系列anchors的左上角和右下角坐标
    """

    ws = ws[:, np.newaxis]
    hs = hs[:, np.newaxis]
    anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
                         y_ctr - 0.5 * (hs - 1),
                         x_ctr + 0.5 * (ws - 1),
                         y_ctr + 0.5 * (hs - 1)))
    return anchors

def _ratio_enum(anchor, ratios):
    """
    Enumerate a set of anchors for each aspect ratio wrt an anchor.
    """
    # 得到一个anchor的宽，高和中心点
    w, h, x_ctr, y_ctr = _whctrs(anchor)
    # anchor的面积
    size = w * h
    # 计算ratios下的宽和高
    size_ratios = size / ratios # [512, 256, 128]
    ws = np.round(np.sqrt(size_ratios)) # ws=[23, 16, 11]
    hs = np.round(ws * ratios) # hs = [12, 16, 22]
    # 计算在ratios下的各anchor的左上和右下坐标
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
    return anchors

def _scale_enum(anchor, scales):
    """
    Enumerate a set of anchors for each scale wrt an anchor.
    """
    # 得到一个anchor的宽，高和中心点
    w, h, x_ctr, y_ctr = _whctrs(anchor)
    # 计算scale下的宽和高
    ws = w * scales
    hs = h * scales
    # 计算在scales下的各anchor的左上和右下坐标
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
    return anchors

if __name__ == '__main__':
    import time
    t = time.time()
    # 生成anchors
    a = generate_anchors()
    print time.time() - t
    print a
    from IPython import embed; embed()

下面展示base_anchor为[0, 0, 15, 15]，ratios=[0.5, 1,2]，scales=[8,16,32]时，生成的9个anchors的坐标信息。

ratios(ws,hs)	scales	坐标
0.5(23，12)	8(184*96)	-84， -40， 99， 55
	16(368*192)	-176，-88，191，103
	32(736*384)	-360，-184，375，191
1(16，16)	8(128*128)	-56， -56， 71， 71
	16(256*256)	-120，-120，135，135
	32(512*512)	-248，-248，263，263
2(11，22)	8(88*176)	-36， -80， 51， 95
	16(176*352)	-80，-168，95，183
	32(352*704)	-168，-344，183，359

根据上面代码，假设定义三种纵横比ratio=[0.5, 1, 2]和三种尺度scale=[8, 16,32]，3*3可以组合9种Anchor boxes。其中ratio为高和宽的比率，表示boxes的形状，scale表示boxes的大小。

1 vs 16真实区域
设一个矩形框的面积 $s$ ，矩形框的宽为 $w$ ，高为h，面积 $\times w$ ，比率定义 $ratio=\frac{h}{w}$ ，经过简单变换之后，就可以得到不同比率下矩形框的宽 $w$ 和高 $h$ 。假设backbone网络为VGG，经过backbone网络后得到的feature map上的一点对应原图像大小的 $16 \times 16$ 区域。面积 $128 \times 128$ ， $256 \times 256$ 和 $512 \times 512$ 是 $16 \times 16$ 的 $8^{2}$ ， $16^{2}$ 和 $32^{2}$ 倍，scale大小为 $[8, 16, 32]$ 。宽和高推导计算步骤如下公式所示。
$\begin{matrix} h = w \times ratio \\ ratio \times w^{2}=s \end{matrix} \Rightarrow \begin{matrix} w = \sqrt{s / ratio} \; \; \\ h = w \times ratio = \sqrt{s \times ratio} \end{matrix} \overset{scale}{\Rightarrow} \begin{matrix} w = sacle \times \sqrt{s / ratio} \; \; \\ h = scale \times \sqrt{s \times ratio} \end{matrix}$

Anchor 数目计算

对于一个大小为 $\times H$ 的卷积特征图，每一个滑动窗口的最大可能候选框的个数记为 $k$ ，那么总共会有 $W H k$ 个anchors。

通过一个 $\times 3$ 大小的滑动窗口在 $\times H$ 大小的特征图中滑动，stride=1且padding为"same"，那么就可以得到W*H个 $\times 3$ 的滑动窗口，那也就是说特征图中的每一点将会作为滑动窗口的中心点。如下图的红色滑动窗口，VGG网络下采样16倍，那么其中心点对应原图中的 $16 \times 16$ 区域。设置ratios和scales参数，特征图中的每一点就可以得到k个不同长宽比例和不同面积的anchors。

RPN
假设输入图像大小为800*600，经过VGG网络下采样16倍，得到的特征图的大小为 $\left( 800 / 16 \right) ceil \left( 800 / 16 \right) = 50 * 38$ ，在Faster RCNN 中，特征图上每一点的anchors数量为9，那么总共会有 $50 \times 38 \times 9 = 17100$ 个anchors。图像化解释见下图。

anchor生成

生成目标anchors

对于大小为800*600的原图，按照上述的计算，将会生成17100个anchors，如果全部用于训练，任务量巨大。而且生成的某些anchor不会起作用，为此，对于每一个anchor，分配一个二值化的标签，包含物体（正）/不包含物体（负）。生成目标anchors就是为了区分哪些anchors是正样本，哪些anchors是负样本。

具体计算方法是计算anchor和ground-truth boxes的IoU大小。正样本选择有两种标准，对于每一个ground-truth box，选取与其IoU最大的anchor作为正样本，或者如果一个anchor至少与任意一个ground-truth box的IoU大于0.7，则认为这个anchor为正样本。如果IoU小于0.3，则认为这个anchor为负样本。那些IoU值位于（0.3，0.7）区间的anchor，则认为对训练无益。

正样本的第一条标准保证了每一个ground-truth box都有一个anchor与之相对应，第二条标准保证有一定数量的anchor可以被筛选作为正样本。

在训练RPN的时候，一张图片随机选取256个anchors计算loss，正样本anchor和负样本anchor的比例是1：1，如果正样本的个数小于128，那么就用负样本来填充。