CenterNet

最新推荐文章于 2023-01-16 18:07:40 发布

xiaochengJF

最新推荐文章于 2023-01-16 18:07:40 发布

阅读量430

点赞数

分类专栏：目标检测文章标签：计算机视觉卷积神经网络深度学习目标检测

本文链接：https://blog.csdn.net/weixin_43711554/article/details/109479652

版权

目标检测专栏收录该内容

41 篇文章 1 订阅

订阅专栏

论文：Objects as Points （CVPR 2019）速达>>
代码：xingyizhou/CenterNet

文章目录

CenterNet

与普通检测器相比：

使用 Anchor Points 而不是 Anchor Boxes
不每个位置预测一个目标，不需要NMS（有相应代替方法）
最终输出特征图分辨率比普通检测器高（下采样×4，普通×16）

与CornerNet相比
CornerNet 需要对预测顶点进行分组才能得到最终预测框，会拖慢算法速度，CenterNet 则不需要。CenterNet 也有三种输出：

Heatmap： $[1, 80, 128, 128]$ （目标检测类别80，确定类别，初步定位）
Offset map： $[1, 2, 128, 128]$ （修正中心点位置），这里是中心点的偏移量
Size map： $[1, 2, 128, 128]$ ，中心点只能表示位置，所以需要预测框的尺寸

CenterNet 与 CornerNet 总体上差别不大，Loss 形式和结构类似，这里多了一个 $\mathcal Loss_{size}$ ，直接用的 $L_1$ Loss，效果比 Smooth $L_1$ Loss更好，后面有实验说明

在这里插入图片描述

不用 NMS 是因为下面的操作已经足够达到 NMS 的效果：

对 Heatmap 进行 3 × 3 max pooling，筛选出的局部最大值（8邻域最大）
筛选出前100个峰值

相关代码：

## https://github.com/xingyizhou/CenterNet/blob/master/src/lib/models/decode.py
def _nms(heat, kernel=3):
    pad = (kernel - 1) // 2

    hmax = nn.functional.max_pool2d(
        heat, (kernel, kernel), stride=1, padding=pad)  # 找到局部最大值
    keep = (hmax == heat).float()  # 筛选出最大值
    return heat * keep

def _topk(scores, K=40):
    batch, cat, height, width = scores.size()
     
    # 每个类别的前 k 个峰值和索引
    topk_scores, topk_inds = torch.topk(scores.view(batch, cat, -1), K)

    topk_inds = topk_inds % (height * width)
    topk_ys   = (topk_inds / width).int().float()
    topk_xs   = (topk_inds % width).int().float()
     
    # 所有类别的前 k 个峰值和索引
    topk_score, topk_ind = torch.topk(topk_scores.view(batch, -1), K)
    topk_clses = (topk_ind / K).int()
    topk_inds = _gather_feat(
        topk_inds.view(batch, -1, 1), topk_ind).view(batch, K)
    topk_ys = _gather_feat(topk_ys.view(batch, -1, 1), topk_ind).view(batch, K)
    topk_xs = _gather_feat(topk_xs.view(batch, -1, 1), topk_ind).view(batch, K)

    return topk_score, topk_inds, topk_clses, topk_ys, topk_xs

def ctdet_decode(heat, wh, reg=None, cat_spec_wh=False, K=100):
    batch, cat, height, width = heat.size()

    # heat = torch.sigmoid(heat)
    # perform nms on heatmaps
    heat = _nms(heat)
      
    scores, inds, clses, ys, xs = _topk(heat, K=K)
    if reg is not None:
      reg = _transpose_and_gather_feat(reg, inds)
      reg = reg.view(batch, K, 2)
      xs = xs.view(batch, K, 1) + reg[:, :, 0:1]
      ys = ys.view(batch, K, 1) + reg[:, :, 1:2]
    else:
      xs = xs.view(batch, K, 1) + 0.5
      ys = ys.view(batch, K, 1) + 0.5
    wh = _transpose_and_gather_feat(wh, inds)
    if cat_spec_wh:
      wh = wh.view(batch, K, cat, 2)
      clses_ind = clses.view(batch, K, 1, 1).expand(batch, K, 1, 2).long()
      wh = wh.gather(2, clses_ind).view(batch, K, 2)
    else:
      wh = wh.view(batch, K, 2)
    clses  = clses.view(batch, K, 1).float()
    scores = scores.view(batch, K, 1)
    bboxes = torch.cat([xs - wh[..., 0:1] / 2, 
                        ys - wh[..., 1:2] / 2,
                        xs + wh[..., 0:1] / 2, 
                        ys + wh[..., 1:2] / 2], dim=2)
    detections = torch.cat([bboxes, scores, clses], dim=2)
      
    return detections