目标检测 - Generalized Focal Loss的Anchor处理机制
flyfish
分析的代码采用的是RangiLyu的nanodet,nanodet提供了轻量级模型从训练到Android端安装部署的整体解决方案
源码地址
代码所在路径是nanodet/nanodet/model/head/gfl_head.py
一、基础配置
octave_base_scale: 5
scales_per_octave: 1
strides: [8, 16, 32]
batchsize_per_gpu: 2 #原来这里160,变小容易输出看到结果
多级特征图信息
多级特征图(Multi-level feature map)大小 list[tuple]
这里是三级特征图 [40, 40],[20, 20], [10, 10]
特征图与stride是配对的
[40, 40]:8
[20, 20]:16
[10, 10]:32
图像信息
img_shapes (h,w)每个图像大小是 320×320
假设单GPU,图片个数等于batchsize_per_gpu的设置。即训练时batch size的大小
batchsize_per_gpu=80,这里就是80张图片,即80个[320, 320]。
这里列出batchsize_per_gpu=2的情况
[[320, 320], [320, 320]]
二、base_anchor
初始化时,base_anchor就已经生成了
三个feature map就有三个基本框,配置octave_base_scale = 5 的情况,下面列出三个基本框的大小
这里看5也相当于一个超参。anchor_base_sizes = list(strides) = 8,16,32
[-16., -16., 23., 23.] base_size = 8 对应特征图 40 × 40
[-32., -32., 47., 47.] base_size = 16 对应特征图 20 × 20
[-64., -64., 95., 95.] base_size = 32 对应特征图 10 × 10
关系
(5 × 8) = 40 ≈ 38 = (16+23)
(5 × 16) = 90 ≈ 79 = (32+47)
(5 × 32) = 160 ≈ 159 = (64+95)
base_anchor的处理在类class AnchorGenerator(object)的def gen_base_anchors(self)函数中。
三、Anchor的生成
这里10×10 的特征图说明,其他两个生成方式相同
10×10 的特征图,stride = 32的情况下原图 320×320和第三个10×10 的feature map之间的映射关系,生成100个大小(159 × 159)的框。相当于10×10的feature map映射回320×320的原图,在原图上产生的anchor。
三个feature map根据base_anchor和stride,shift_x,shift_y的信息在生成框
feature map 40×40
base_anchors: tensor([[-16., -16., 23., 23.]])
featmap_sizes: torch.Size([40, 40])
anchor_strides: 8
shift_x: tensor([ 0, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104,
112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216,
224, 232, 240, 248, 256, 264, 272, 280, 288, 296, 304, 312])
shift_y: tensor([ 0, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104,
112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216,
224, 232, 240, 248, 256, 264, 272, 280, 288, 296, 304, 312])
feature map 20×20
base_anchors tensor([[-32., -32., 47., 47.]])
featmap_sizes: torch.Size([20, 20])
anchor_strides: 16
shift_x: tensor([ 0, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208,
224, 240, 256, 272, 288, 304])
shift_y: tensor([ 0, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208,
224, 240, 256, 272, 288, 304])
feature map 10×10
base_anchors: tensor([[-64., -64., 95., 95.]])
featmap_sizes: torch.Size([10, 10])
anchor_strides: 32
shift_x: tensor([ 0, 32, 64, 96, 128, 160, 192, 224, 256, 288])
shift_y: tensor([ 0, 32, 64, 96, 128, 160, 192, 224, 256, 288])
说明
这里以feature map 10×10 为例
base_anchors: tensor([[-64., -64., 95., 95.]])
featmap_sizes: torch.Size([10, 10])
anchor_strides: 32
因为是10,产生0到9之间10个整数
每个数字乘以stride=32就是shift_x
因为featmap_size的10和10两个数相同那么shift_x和shift_y就相同
shift_x: tensor([ 0, 32, 64, 96, 128, 160, 192, 224, 256, 288])
shift_y: tensor([ 0, 32, 64, 96, 128, 160, 192, 224, 256, 288])
base_anchors + (shift_x,shift_y) = 移动产生的框
例如 基本框是[-64., -64., 95., 95.]
[-64 + shift_x, -64 + shift_y, 95 + shift_x, 95 + shift_y]
也就是代码中shift_x,shift_y由_meshgrid函数实现
def _meshgrid(self, x, y, row_major=True):
xx = x.repeat(len(y))
yy = y.view(-1, 1).repeat(1, len(x)).view(-1)
if row_major:
return xx, yy
else:
return yy, xx
shifts = torch.stack([shift_xx, shift_yy, shift_xx, shift_yy], dim=-1)
[-64., -64., 95., 95.] + (shift_x[0],shift_y[0]) = [-64., -64., 95., 95.]
[-64., -64., 95., 95.] + (shift_x[1],shift_y[0]) = [-32., -64., 127., 95.]
[-64., -64., 95., 95.] + (shift_x[2],shift_y[0]) = [ 0., -64., 159., 95.]
[-64., -64., 95., 95.] + (shift_x[3],shift_y[0]) = [ 32., -64., 191., 95.]
......
feature map 10×10这样横的10个,纵的10,一共100个框
[-64., -64., 95., 95.],
[-32., -64., 127., 95.],
[ 0., -64., 159., 95.],
[ 32., -64., 191., 95.],
[ 64., -64., 223., 95.],
[ 96., -64., 255., 95.],
[128., -64., 287., 95.],
[160., -64., 319., 95.],
[192., -64., 351., 95.],
[224., -64., 383., 95.],
[-64., -32., 95., 127.],
[-32., -32., 127., 127.],
[ 0., -32., 159., 127.],
[ 32., -32., 191., 127.],
[ 64., -32., 223., 127.],
[ 96., -32., 255., 127.],
[128., -32., 287., 127.],
[160., -32., 319., 127.],
[192., -32., 351., 127.],
[224., -32., 383., 127.],
[-64., 0., 95., 159.],
[-32., 0., 127., 159.],
[ 0., 0., 159., 159.],
[ 32., 0., 191., 159.],
[ 64., 0., 223., 159.],
[ 96., 0., 255., 159.],
[128., 0., 287., 159.],
[160., 0., 319., 159.],
[192., 0., 351., 159.],
[224., 0., 383., 159.],
......
上面只列出了feature map 10×10的30个框
上述的代码实现在grid_anchors函数中
def grid_anchors(self, featmap_size, stride=16, device='cuda'):
base_anchors = self.base_anchors.to(device)
feat_h, feat_w = featmap_size
shift_x = torch.arange(0, feat_w, device=device) * stride
shift_y = torch.arange(0, feat_h, device=device) * stride
shift_xx, shift_yy = self._meshgrid(shift_x, shift_y)
shifts = torch.stack([shift_xx, shift_yy, shift_xx, shift_yy], dim=-1)
shifts = shifts.type_as(base_anchors)
# first feat_w elements correspond to the first row of shifts
# add A anchors (1, A, 4) to K shifts (K, 1, 4) to get
# shifted anchors (K, A, 4), reshape to (K*A, 4)
all_anchors = base_anchors[None, :, :] + shifts[:, None, :]
all_anchors = all_anchors.view(-1, 4)
# first A rows correspond to A anchors of (0, 0) in feature map,
# then (0, 1), (0, 2), ...
return all_anchors
三个feature map:40×40 + 20×20 + 10×10 = 2100个框