目标检测 - Generalized Focal Loss的Anchor处理机制

最新推荐文章于 2023-11-25 23:15:51 发布

西笑生

最新推荐文章于 2023-11-25 23:15:51 发布

阅读量1.3k

点赞数 1

分类专栏：目标检测文章标签： nanodet Focal Loss 目标检测 anchorfree

本文链接：https://blog.csdn.net/flyfish1986/article/details/110245329

版权

目标检测专栏收录该内容

60 篇文章 118 订阅

订阅专栏

目标检测 - Generalized Focal Loss的Anchor处理机制

flyfish

分析的代码采用的是RangiLyu的nanodet，nanodet提供了轻量级模型从训练到Android端安装部署的整体解决方案
源码地址
代码所在路径是nanodet/nanodet/model/head/gfl_head.py

一、基础配置

octave_base_scale: 5
scales_per_octave: 1
strides: [8, 16, 32]
batchsize_per_gpu: 2 #原来这里160，变小容易输出看到结果

多级特征图信息

多级特征图（Multi-level feature map）大小 list[tuple]
这里是三级特征图 [40, 40]，[20, 20], [10, 10]
特征图与stride是配对的
[40, 40]：8
[20, 20]：16
[10, 10]：32

图像信息

img_shapes (h,w)每个图像大小是 320×320
假设单GPU，图片个数等于batchsize_per_gpu的设置。即训练时batch size的大小
batchsize_per_gpu=80,这里就是80张图片，即80个[320, 320]。
这里列出batchsize_per_gpu=2的情况
[[320, 320], [320, 320]]

二、base_anchor

初始化时，base_anchor就已经生成了
三个feature map就有三个基本框，配置octave_base_scale = 5 的情况,下面列出三个基本框的大小
这里看5也相当于一个超参。anchor_base_sizes = list(strides) = 8,16,32

[-16., -16.,  23.,  23.] base_size = 8   对应特征图 40 × 40
[-32., -32.,  47.,  47.] base_size = 16  对应特征图 20 × 20
[-64., -64.,  95.,  95.] base_size = 32  对应特征图 10 × 10

关系

(5 × 8)  = 40  ≈ 38  = (16+23)
(5 × 16) = 90  ≈ 79  = (32+47)
(5 × 32) = 160 ≈ 159 = (64+95)

base_anchor的处理在类class AnchorGenerator(object)的def gen_base_anchors(self)函数中。

三、Anchor的生成

这里10×10 的特征图说明，其他两个生成方式相同
10×10 的特征图，stride = 32的情况下原图 320×320和第三个10×10 的feature map之间的映射关系，生成100个大小（159 × 159）的框。相当于10×10的feature map映射回320×320的原图，在原图上产生的anchor。

三个feature map根据base_anchor和stride，shift_x，shift_y的信息在生成框

feature map 40×40

base_anchors： tensor([[-16., -16.,  23.,  23.]])
featmap_sizes: torch.Size([40, 40]) 
anchor_strides: 8
shift_x: tensor([  0,   8,  16,  24,  32,  40,  48,  56,  64,  72,  80,  88,  96, 104,
        112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216,
        224, 232, 240, 248, 256, 264, 272, 280, 288, 296, 304, 312])
shift_y: tensor([  0,   8,  16,  24,  32,  40,  48,  56,  64,  72,  80,  88,  96, 104,
        112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216,
        224, 232, 240, 248, 256, 264, 272, 280, 288, 296, 304, 312])

feature map 20×20

base_anchors tensor([[-32., -32.,  47.,  47.]])
featmap_sizes: torch.Size([20, 20]) 
anchor_strides: 16
shift_x: tensor([  0,  16,  32,  48,  64,  80,  96, 112, 128, 144, 160, 176, 192, 208,
        224, 240, 256, 272, 288, 304])
shift_y: tensor([  0,  16,  32,  48,  64,  80,  96, 112, 128, 144, 160, 176, 192, 208,
        224, 240, 256, 272, 288, 304])

feature map 10×10

base_anchors： tensor([[-64., -64.,  95.,  95.]])
featmap_sizes: torch.Size([10, 10]) 
anchor_strides: 32
shift_x: tensor([  0,  32,  64,  96, 128, 160, 192, 224, 256, 288])
shift_y: tensor([  0,  32,  64,  96, 128, 160, 192, 224, 256, 288])

说明

这里以feature map 10×10 为例
base_anchors： tensor([[-64., -64., 95., 95.]])
featmap_sizes: torch.Size([10, 10])
anchor_strides: 32
因为是10,产生0到9之间10个整数
每个数字乘以stride=32就是shift_x
因为featmap_size的10和10两个数相同那么shift_x和shift_y就相同

shift_x: tensor([  0,  32,  64,  96, 128, 160, 192, 224, 256, 288])
shift_y: tensor([  0,  32,  64,  96, 128, 160, 192, 224, 256, 288])

base_anchors + （shift_x，shift_y） = 移动产生的框

例如基本框是[-64., -64., 95., 95.]
[-64 + shift_x, -64 + shift_y, 95 + shift_x, 95 + shift_y]
也就是代码中shift_x，shift_y由_meshgrid函数实现

def _meshgrid(self, x, y, row_major=True):
    xx = x.repeat(len(y))
    yy = y.view(-1, 1).repeat(1, len(x)).view(-1)
    if row_major:
        return xx, yy
    else:
        return yy, xx

shifts = torch.stack([shift_xx, shift_yy, shift_xx, shift_yy], dim=-1)

[-64., -64.,  95.,  95.] + (shift_x[0],shift_y[0]) = [-64., -64.,  95.,  95.]
[-64., -64.,  95.,  95.] + (shift_x[1],shift_y[0]) = [-32., -64., 127.,  95.]
[-64., -64.,  95.,  95.] + (shift_x[2],shift_y[0]) = [  0., -64., 159.,  95.]
[-64., -64.,  95.,  95.] + (shift_x[3],shift_y[0]) = [ 32., -64., 191.,  95.]
......

feature map 10×10这样横的10个，纵的10,一共100个框

[-64., -64.,  95.,  95.],
[-32., -64., 127.,  95.],
[  0., -64., 159.,  95.],
[ 32., -64., 191.,  95.],
[ 64., -64., 223.,  95.],
[ 96., -64., 255.,  95.],
[128., -64., 287.,  95.],
[160., -64., 319.,  95.],
[192., -64., 351.,  95.],
[224., -64., 383.,  95.],

[-64., -32.,  95., 127.],
[-32., -32., 127., 127.],
[  0., -32., 159., 127.],
[ 32., -32., 191., 127.],
[ 64., -32., 223., 127.],
[ 96., -32., 255., 127.],
[128., -32., 287., 127.],
[160., -32., 319., 127.],
[192., -32., 351., 127.],
[224., -32., 383., 127.],

[-64.,   0.,  95., 159.],
[-32.,   0., 127., 159.],
[  0.,   0., 159., 159.],
[ 32.,   0., 191., 159.],
[ 64.,   0., 223., 159.],
[ 96.,   0., 255., 159.],
[128.,   0., 287., 159.],
[160.,   0., 319., 159.],
[192.,   0., 351., 159.],
[224.,   0., 383., 159.],
......

上面只列出了feature map 10×10的30个框

上述的代码实现在grid_anchors函数中

 def grid_anchors(self, featmap_size, stride=16, device='cuda'):
        base_anchors = self.base_anchors.to(device)

        feat_h, feat_w = featmap_size

        shift_x = torch.arange(0, feat_w, device=device) * stride
        shift_y = torch.arange(0, feat_h, device=device) * stride

        shift_xx, shift_yy = self._meshgrid(shift_x, shift_y)
        shifts = torch.stack([shift_xx, shift_yy, shift_xx, shift_yy], dim=-1)
        shifts = shifts.type_as(base_anchors)
        # first feat_w elements correspond to the first row of shifts
        # add A anchors (1, A, 4) to K shifts (K, 1, 4) to get
        # shifted anchors (K, A, 4), reshape to (K*A, 4)

        all_anchors = base_anchors[None, :, :] + shifts[:, None, :]
        all_anchors = all_anchors.view(-1, 4)
        # first A rows correspond to A anchors of (0, 0) in feature map,
        # then (0, 1), (0, 2), ...
        return all_anchors