论文:Objects as Points (CVPR 2019)速达>>
代码:xingyizhou/CenterNet
CenterNet
与普通检测器相比:
- 使用 Anchor Points 而不是 Anchor Boxes
- 不每个位置预测一个目标,不需要NMS(有相应代替方法)
- 最终输出特征图分辨率比普通检测器高(下采样×4,普通×16)
与CornerNet相比
CornerNet 需要对预测顶点进行分组才能得到最终预测框,会拖慢算法速度,CenterNet 则不需要。CenterNet 也有三种输出:
- Heatmap: [ 1 , 80 , 128 , 128 ] [1,80,128,128] [1,80,128,128](目标检测类别80,确定类别,初步定位)
- Offset map: [ 1 , 2 , 128 , 128 ] [1,2,128,128] [1,2,128,128](修正中心点位置),这里是中心点的偏移量
- Size map: [ 1 , 2 , 128 , 128 ] [1,2,128,128] [1,2,128,128],中心点只能表示位置,所以需要预测框的尺寸
CenterNet 与 CornerNet 总体上差别不大,Loss 形式和结构类似,这里多了一个 L o s s s i z e \mathcal Loss_{size} Losssize,直接用的 L 1 L_1 L1 Loss,效果比 Smooth L 1 L_1 L1 Loss更好,后面有实验说明
不用 NMS 是因为下面的操作已经足够达到 NMS 的效果:
- 对 Heatmap 进行 3 × 3 max pooling,筛选出的局部最大值(8邻域最大)
- 筛选出前100个峰值
相关代码:
## https://github.com/xingyizhou/CenterNet/blob/master/src/lib/models/decode.py
def _nms(heat, kernel=3):
pad = (kernel - 1) // 2
hmax = nn.functional.max_pool2d(
heat, (kernel, kernel), stride=1, padding=pad) # 找到局部最大值
keep = (hmax == heat).float() # 筛选出最大值
return heat * keep
def _topk(scores, K=40):
batch, cat, height, width = scores.size()
# 每个类别的前 k 个峰值和索引
topk_scores, topk_inds = torch.topk(scores.view(batch, cat, -1), K)
topk_inds = topk_inds % (height * width)
topk_ys = (topk_inds / width).int().float()
topk_xs = (topk_inds % width).int().float()
# 所有类别的前 k 个峰值和索引
topk_score, topk_ind = torch.topk(topk_scores.view(batch, -1), K)
topk_clses = (topk_ind / K).int()
topk_inds = _gather_feat(
topk_inds.view(batch, -1, 1), topk_ind).view(batch, K)
topk_ys = _gather_feat(topk_ys.view(batch, -1, 1), topk_ind).view(batch, K)
topk_xs = _gather_feat(topk_xs.view(batch, -1, 1), topk_ind).view(batch, K)
return topk_score, topk_inds, topk_clses, topk_ys, topk_xs
def ctdet_decode(heat, wh, reg=None, cat_spec_wh=False, K=100):
batch, cat, height, width = heat.size()
# heat = torch.sigmoid(heat)
# perform nms on heatmaps
heat = _nms(heat)
scores, inds, clses, ys, xs = _topk(heat, K=K)
if reg is not None:
reg = _transpose_and_gather_feat(reg, inds)
reg = reg.view(batch, K, 2)
xs = xs.view(batch, K, 1) + reg[:, :, 0:1]
ys = ys.view(batch, K, 1) + reg[:, :, 1:2]
else:
xs = xs.view(batch, K, 1) + 0.5
ys = ys.view(batch, K, 1) + 0.5
wh = _transpose_and_gather_feat(wh, inds)
if cat_spec_wh:
wh = wh.view(batch, K, cat, 2)
clses_ind = clses.view(batch, K, 1, 1).expand(batch, K, 1, 2).long()
wh = wh.gather(2, clses_ind).view(batch, K, 2)
else:
wh = wh.view(batch, K, 2)
clses = clses.view(batch, K, 1).float()
scores = scores.view(batch, K, 1)
bboxes = torch.cat([xs - wh[..., 0:1] / 2,
ys - wh[..., 1:2] / 2,
xs + wh[..., 0:1] / 2,
ys + wh[..., 1:2] / 2], dim=2)
detections = torch.cat([bboxes, scores, clses], dim=2)
return detections
多个目标中心点冲突问题
对于 CenterNet 来说问题不大,因为输出特征图分辨率够高,预测够密集 ,在 Coco 数据集上这样的情况不到 0.1%,
相关实验
Backbone 对比
与其它检测器比较(With multi-scale evaluation)
参考文献
【1】CornerNet
【2】CenterNet算法笔记