【Faster R-CNN论文精度系列】
(如下为建议阅读顺序)
1【Faster R-CNN论文精度系列】从Faster R-CNN源码中,我们“学习”到了什么?
2【Faster R-CNN论文精度系列】代码解读并深入理解Region Proposal Network
3【Faster R-CNN论文精度系列】代码解读并深入理解Anchor和Anchor Box
4【Faster R-CNN论文精度系列】原文精析
Preview
先来回顾一下Faster R-CNN原文中RPN(first module)的实现过程,其中涉及到对Anchor机制的描述,这是一个很值得深入研究的点,所以本文在前人的研究基础对anchor机制进一步展开研究,通过引入-源码-分析-举例的结构来帮助理解,转载请注明出处,感谢阅读。
如下图,是Faster R-CNN中first module的框架实现,输入一张任意的图片,然后resize到固定的图大小(实际上resize后的大小不确定,这里方便描述取了一个固定值),再进行特征提取(卷积神经网络),根据feature map的大小产生了一系列的anchor和anchor box:
可以看到,anchor box实际上是基于feature map的,而anchorbox的数量为:75 x 100 x 9 = 67500,这里的9就是论文中的k,后面会说道,也就是说,anchorbox的数量是依赖与featuremap的,featuremap上的每个点,都对应着k个anchor box。
源码分析
"""
# --------------------------------------------------------
# Faster R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick and Sean Bell
# --------------------------------------------------------
# --------------------------------------------------------
# generate_anchors.py功能描述:
# 生成多尺度、多宽高比的anchors
# 尺度分别为:128,256,512
# 宽高比分别为:宽高比为:1:2,1:1,2:1
# --------------------------------------------------------
"""
import numpy as np #提供矩阵运算功能的库
"""
# 生成anchors的main函数:定义了一个列表ratios(表示宽高比)为:1:2,1:1,2:1
# 定义了一个列表scales:[2^3 2^4 2^5],即:[8 16 32]
# 其中2**x表示:2^x,
"""
def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
scales=2**np.arange(3, 6)):
"""
Generate anchor (reference) windows by enumerating(枚举) aspect ratios X
scales wrt a reference (0, 0, 15, 15) window.
"""
base_anchor = np.array([1, 1, base_size, base_size]) - 1 #新建一个参考anchor数组:base_anchor:[0 0 15 15]
ratio_anchors = _ratio_enum(base_anchor, ratios) #枚举各种宽高比
anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
for i in xrange(ratio_anchors.shape[0])]) #shape[0]:读取矩阵第一维长度,其值为3
return anchors
def _whctrs(anchor):
"""
Return width, height, x center, and y center for an anchor (window).
"""
w = anchor[2] - anchor[0] + 1 # 计算宽度:15-0+1=16
h = anchor[3] - anchor[1] + 1 # 计算高度:15-0+1=16
x_ctr = anchor[0] + 0.5 * (w - 1) # 计算x的中心坐标:0+0.5*15=7.5
y_ctr = anchor[1] + 0.5 * (h - 1) # 计算y的中心坐标:0+0.5*15=7.5
return w, h, x_ctr, y_ctr
# 预测窗口的作用,输出anchor的面积相等,只是宽高比不同
def _mkanchors(ws, hs, x_ctr, y_ctr):
"""
Given a vector of widths (ws) and heights (hs) around a center
(x_ctr, y_ctr), output a set of anchors (windows).
exp. ws:[23 16 11],hs:[12 16 22],ws 和 hs一一对应。
"""
ws = ws[:, np.newaxis]
hs = hs[:, np.newaxis] # newaxis:将数组转置
# hstack、vstack:合并数组
anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
y_ctr - 0.5 * (hs - 1), # anchor:[[-3.5 2 18.5 13]
x_ctr + 0.5 * (ws - 1), # [0 0 15 15]
y_ctr + 0.5 * (hs - 1))) # [2.5 -3 12.5 18]]
return anchors
# 枚举一个anchor的各种宽高比,anchor[0 0 15 15],ratios[0.5,1,2]
def _ratio_enum(anchor, ratios):
"""
Enumerate a set of anchors for each aspect ratio wrt an anchor.
"""
w, h, x_ctr, y_ctr = _whctrs(anchor) # 返回宽高和中心坐标:[w,h,x_ctr,y_ctr]=[16,16,7.5,7.5]
size = w * h # 计算面积:size=16*16=256
size_ratios = size / ratios # 256/ratios[0.5,1,2]=[512,256,128]
# round()方法返回x的四舍五入的数字,sqrt()方法返回数字x的平方根
ws = np.round(np.sqrt(size_ratios)) # ws=[23,16,11]
hs = np.round(ws * ratios) # hs=[12,16,22],这里ws和hs一一对应
anchors = _mkanchors(ws, hs, x_ctr, y_ctr) # 调用make anchor函数,即给定一组宽高向量,输出各个预测窗口(anchor boxes)
return anchors
def _scale_enum(anchor, scales):
"""
Enumerate a set of anchors for each scale wrt an anchor.
枚举一个anchor的各种尺度,以anchor[0 0 15 15]为例,scales[8 16 32]
"""
w, h, x_ctr, y_ctr = _whctrs(anchor) # 返回宽高和中心坐标:[w,h,x_ctr,y_ctr]=[16,16,7.5,7.5]
ws = w * scales # 16*[8,16,32]=[128,256,512]
hs = h * scales # 16*[8,16,32]=[128,256,512]
anchors = _mkanchors(ws, hs, x_ctr, y_ctr) # 调用_mkanchors函数
return anchors
if __name__ == '__main__':
import time
t = time.time()
a = generate_anchors() # 生成anchor boxes
print time.time() - t # 显示时间
print a
from IPython import embed; embed()
def _ratio_enum(anchor, ratios)
作者定义了_ratio_enum函数,此部分生成了三种宽高比(multi aspect ratio)的anchor(比例数组为:ratios=[0.5, 1, 2]),如下图所示(三种比例分别对应绿、红、蓝):
def _scale_enum(anchor, scales)
作者定义了_scale_enum()函数的功能就是对anchor进行scale变换,生成三种尺寸的anchor(每个尺寸的anchor再对应三种不同横纵比,共k=9个anchor),以_ratio_enum()部分生成的anchor[0 0 15 15]为例,扩展了三种尺度 128128,256256,512*512,如下图所示:
Anchor机制总结(精华部分)
- 首先,弄清什么是anchor和anchor box:
anchor box实际上是基于feature map中的特征点的(这里的特征点就是anchor),所以多大的图像决定了有多少个anchor,进而多少anchor对应多少个anchor boxes - 其次,每个anchor上的box是按照什么规则产生的:
根据源码设定:def generate_anchors(base_size=16, ratios=[0.5, 1, 2], scales=2**np.arange(3, 6)),我们可以很明确的得知,在feature map中的基础大小为16,多宽高比例为[0.5, 1, 2],多尺度为[128,256,512],也就是feature map中16*16大小的anchor box,通过宽高比变换和多尺度变换,对应于原图中的region proposal。(重要理解!) - 再来对源码中的具体模块和功能进行分析:
- generate_anchors:Generate anchor (reference) windows
- _whctrs:Return width, height, x center, and y center for an anchor (window)
- _mkanchors:Given a vector of widths (ws) and heights (hs) around a center
(x_ctr, y_ctr), output a set of anchors (windows) - _ratio_enum:Enumerate a set of anchors for each aspect ratio wrt an anchor.
- _scale_enum:Enumerate a set of anchors for each scale wrt an anchor.
源码RPN分析(rpn module overview)
generate_anchors.py
Generates a regular grid of multi-scale, multi-aspect anchor boxes.
proposal_layer.py
Converts RPN outputs (per-anchor scores and bbox regression estimates) into object proposals.
anchor_target_layer.py
Generates training targets/labels for each anchor. Classification labels are 1 (object), 0 (not object) or -1 (ignore).
Bbox regression targets are specified when the classification label is > 0.
proposal_target_layer.py
Generates training targets/labels for each object proposal: classification labels 0 - K (bg or object class 1, … , K)
and bbox regression targets in that case that the label is > 0.
generate.py
Generate object detection proposals from an imdb using an RPN.
通过实例理解源码
- 在源码中我就穿插写入了一个小实例,现在我给出手写版本的算法:
以anchor box (reference)为例:
anchor=[x1,y1,x2,y2]=[0,0,15,15]
求长宽和中心点坐标(anchor[0] + 0.5 * (w - 1))或(anchor[1] + 0.5 * (h - 1)):
[w, h, x_ctr, y_ctr]=[16,16,7.5,7.5]
对各种scale的anchor进行枚举
w, h, x_ctr, y_ctr = _whctrs(anchor) # 返回宽高和中心坐标:[w,h,x_ctr,y_ctr]=[16,16,7.5,7.5]
ws = w * scales # 16*[8,16,32]=[128,256,512]
hs = h * scales # 16*[8,16,32]=[128,256,512]
对各种ratio大小的anchor进行枚举(以基础款base_size:16作为举例)
得到长宽和中心点参数:[w, h, x_ctr, y_ctr]=[16,16,7.5,7.5]
计算面积:size=16*16=256
计算不同ratio的面积:256/[0.5,1,2]=[512,256,128]
求新的w(ws = np.round(np.sqrt(size_ratios)))和h(hs = np.round(ws * ratios)):ws=[23,16,11]且hs=[12,16,22]
- 另外,源代码中generate_anchor.py文件还专门给了验证说明,表示对于python实现和matlab的实现方法,其生成内容是一致的(见源码):
# Verify that we compute the same anchors as Shaoqing's matlab implementation:
# >> load output/rpn_cachedir/faster_rcnn_VOC2007_ZF_stage1_rpn/anchors.mat
# >> anchors
Reference
https://blog.csdn.net/smf0504/article/details/52751257?tdsourcetag=s_pctim_aiomsg
https://blog.csdn.net/qian99/article/details/79942591