Faster R-CNN原论文中有3个尺寸(128,256,512)和三个比例(1:1 1:2 2:1)共9种不同的anchor,但在计算1:2与2:1这两种比例的anchor尺寸却没有详细的解释,只好从源码入手分析。
def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
scales=2 ** np.arange(3, 6)):
"""
Generate anchor (reference) windows by enumerating aspect ratios X
scales wrt a reference (0, 0, 15, 15) window.
"""
base_anchor = np.array([1, 1, base_size, base_size]) - 1
ratio_anchors = _ratio_enum(base_anchor, ratios)
anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
for i in range(ratio_anchors.shape[0])])
return anchors
首先看源码中generate_anchors函数中的参数:
1. base_size是感受野的大小,论文中也同样写到在ZF和VGG网络中总步长为16。
“On the re-scaled images, the total stride for both ZF and VGG netson the last convolutional layer is 16 pixels”
2. ratios是anchor的长宽比例,1:2=0.5,1:1=1,2:1=2。
3. scales是anchor的尺寸规模,np.arange(3,6)的含义是以3为起点,6为终点,取3,4,5这三个数作为2的指数,计算结果为8,16,32。注意此时还并没有生成anchor尺寸
base_anchors是初始anchor的尺寸 (0,0,15,15),其中前两个数为左上角坐标,后两个数为右下角坐标。
再看调用的_ratio_enum函数与_whcirs函数:
def _ratio_enum(anchor, ratios):
"""
Enumerate a set of anchors for each aspect ratio wrt an anchor.
"""
w, h, x_ctr, y_ctr = _whctrs(anchor)
size = w * h # 16*16=256
size_ratios = size / ratios #512 256 128
ws = np.round(np.sqrt(size_ratios))
hs = np.round(ws * ratios)
anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
return anchors
def _whctrs(anchor):
"""
Return width, height, x center, and y center for an anchor (window).
"""
w = anchor[2] - anchor[0] + 1 # 16
h = anchor[3] - anchor[1] + 1 # 16
x_ctr = anchor[0] + 0.5 * (w - 1) # 7.5
y_ctr = anchor[1] + 0.5 * (h - 1) # 7.5
return w, h, x_ctr, y_ctr
w,h,x_ctr,y_ctr分别是anchor的宽 高 中心点横坐标 中心点纵坐标,需要用_whctrs函数计算。
举一个例子:如果传入一个anchor为 [0,0,15,15],计算过程如下图所示:
将输出的w,h,x_ctr,y_ctr再返回_ratio_enum函数中, 此处size_ratios = size / ratios才真正输出3种尺寸的anchor。
未完待续…