【CornerNet】《CornerNet: Detecting Objects as Paired Keypoints》

最新推荐文章于 2021-01-09 04:26:27 发布

bryant_meng

最新推荐文章于 2021-01-09 04:26:27 发布

阅读量341

点赞数 1

分类专栏： CNN / Transformer 文章标签： Object Detection CornerNet anchor-free

本文链接：https://blog.csdn.net/bryant_meng/article/details/95236921

版权

CNN / Transformer 专栏收录该内容

201 篇文章 7 订阅

订阅专栏

在这里插入图片描述
ECCV-2018

code pytorch：https://github.com/princeton-vl/CornerNet
跑自己的 dataset：CornerNet训练不完全指南

1 Background and Motivation

anchor-based（原文中是 anchor box） method 的 drawback 如下：

需要大量的框，this creates a huge imbalance between positive and negative anchor boxes and slows down training
introduces many hyper-parameters and design choices（numbers，ratio，scale）去设计 anchor

作者提出 anchor-free 的方法，通过预测 top-left corner 和 bottom-right corner heat-map，配合 embedding vector（点 group 成框，同一框框的左上右下embedding distance is small），确定 bounding box。
在这里插入图片描述

作者 hypothesize two reasons why detecting corners would work better than bounding box centers or proposals.

定位 centers 依赖于 4 sides，而 corner only with 2 sides，且 corner pooling 还引入了 prior knowledge about the definition of corners.
just need $O (w h)$ corners to represent $O(w^2h^2)$ （两个点组合起来）possible anchor boxes.

其实关于作者说的这两点，感觉理解起来不是那么通顺！！！

2 Advantages / Contributions

first to formulate the task of object detection as a task of detecting and grouping corners simultaneously
corner pooling

3 Method

在这里插入图片描述

从 resnet 的 bottleneck 改变而来

核心部分

Heatmaps（binary mask）：预测左上和右下两个点，H×W×C（categories），也就是 class-specifically
Embeddings（1 dimension）：embedding vector，group 点成框
Offsets：refine the bounding box，让 location 更准确
Corner Pooling

3.1 Backbone

采用的是 Hourglass Network，depth，104，堆了两个 hourglass，输入 511×511，输出 128×128，channel 变化（256，384，384，384，512），大致如下所示
在这里插入图片描述
图片来源：https://blog.csdn.net/u013841196/article/details/81048237

3.2 Corner Pooling

在这里插入图片描述
这个图可以看出，点可不是那么好 location 的哈，因为并不在 object 上（是不是间接反应了 bounding box 的局限性，哈哈哈），作者提出 corner pooling 来处理这个！

具体如下：
在这里插入图片描述
朝着箭头的方向，取max，然后相加

公式化表示如下

$f_t$ 、 $f_l$ be the feature maps that are inputs to the top-left corner pooling layer
$f_{t_{ij}}$ 、 $f_{l_{ij}}$ 是 $f_t$ 、 $f_l$ 上 location $(i, j)$ 对应的 vectors
$t_{ij}$ 是 $f_t$ 中 $(i, j)$ 到 $(H, j)$ max pooling 后的结果
$l_{ij}$ 是 $f_l$ 中 $(i, j)$ 到 $(i, W)$ max pooling 后的结果

bottom-right 如下右所示
在这里插入图片描述
实话说，这么做为什么能很好的捕抓到 corner，不是特别理解！！！

3.3 Loss

在这里插入图片描述
$\alpha$ 、 $\beta$ 设置为 0.1， $\gamma$ 设置为 1

1） $L_{det}$

detection 的分类 loss 是改进版的 focal loss，我们先来回顾下 focal loss，参考【Focal Loss】《Focal Loss for Dense Object Detection》
在这里插入图片描述
画重点，well-classified examples is down-weighted，也就是削减了简单样本的权重!

这篇论文作者的分类 loss 改进如下：
在这里插入图片描述

$p_{cij}$ is the score at location $(i, j)$ for class $c$ in the predicted heatmaps
$y_{cij}$ is ground truth heatmap augmented with the unnormalized Gaussian，这是什么意思呢？看下面的说明：

heatmap 是 binary mask，按道理哈，gt 也就两个点，一个框，其它都是 negative，作者给 gt 加了个 2D 高斯半径， $\sigma = 1/3$ ，叫做 penalty reduction，预测的点正好对应 gt 的话（圆中心），penalty reduction $y_{cij}$ 最大， $1-y_{cij}$ 也就是 penalty 最小，离 gt 越远（离圆心越远），penalty reduction $y_{cij}$ 越小， $1-y_{cij}$ 越大。

在这里插入图片描述
为什么这么设计呢？请看下图

gt bounding box 是红色的框框，可以看出，gt corner 的一定范围内（橘色圆圈）group 的 bounding box 也和 gt bounding box 有很高的 IoU，所以作者才这么设计 loss！像是 cross entropy + focal loss，然后乘了个 penalty 系数 $(1-y_{cij})^{\beta}$ .

注意这里的 $y_{cij} = 1$ 不是一个点，而是上面的橘色圆圈内！作者圈圈大小的设计是保证与 GT 的 IoU at least 0.7

2） $L_{pull}$ and $L_{push}$
拉近同一目标两个点的距离，拉远不同目标两个点的距离，类似于类内最小，类间最大（参考目标检测论文阅读：CornerNet 的解释）！！！
在这里插入图片描述

pull loss $L_{pull}$ to group the corner in the same object
push loss $L_{push}$ to separate the corner in the different object
N 是 corner 的数量
$e_{t_{k}}$ 是第 $k$ 个 top-left corner 的 embedding（1 dimension）
$e_{b_{k}}$ 是第 $k$ 个 bottom-right corner 的 embedding（1 dimension）
$e_k$ 是 the average of $e_{t_{k}}$ and $e_{b_{k}}$
$\Delta = 1$

$L_{push}$ 采用的是 hinge loss，也即， $e_k$ 与 $e_j$ 越接近，损失越大，越大越小（最小是0）
在这里插入图片描述

3） $L_{off}$

这个 loss 是更加精确的定位的，gt 的 offset 如下，显然 $\frac{x_k}{n}$ 才是更准确的，而 $\left \lfloor \frac{x_k}{n} \right \rfloor$ 则是从原图 mapping 到 feature map 上的结果！
在这里插入图片描述
采用的是 smooth L1 loss，来学 offset

$x_k$ and $y_k$ are the $x$ and $y$ coordinate for corner $k$

4 Experiments

4.1 Datasets

MS COCO

train+val：135k
mini-val：5k
test-dev：20k

4.2 Ablation Study

1）Corner Pooling
with 和 without corner pooling
在这里插入图片描述
可以看出，中等目标和大目标的提升比较明显。
This is expected because the topmost, bottommost, leftmost, rightmost boundaries of medium and large objects are likely to be further away from the corner locations.

2）Reducing penalty to negative locations
在这里插入图片描述
we see that the penalty reduction especially benefits medium and large objects.

3）Error Analysis

这个实验好骚
在这里插入图片描述
把 heatmaps 替换成 gt，把 heatmaps 和 offsets 都替换成 gt！效果太……恐怖！！！说明 heatmaps 和 offsets 还有很大的提升空间！

4.3 Comparisons with state-of-the-art detectors

在这里插入图片描述
Cascade RCNN 挺猛的哈，还有 SNIP

在这里插入图片描述
demo，top-left，bottom-right

5 Conclusion（owns）

backbone 有缘的话画个草图出来，看下 hourglass 的论文
可以研究下 embedding，以及学习下人体姿态检测那里边的 group 点的现状
corner pooling 如何理解呢？

下面节选一些看到不错的博客！

作者：Makalo.W
来源：CSDN
原文：https://blog.csdn.net/weixin_43688730/article/details/84034604
版权声明：本文为博主原创文章，转载请附上博文链接！

Corner Pooling 的作用：作者在论文里面说，由于预测的那两个点，并不是在‘’内容‘’上面，而是在‘’内容‘’旁边，例如下图，橘色的点并不是打在人物身上，而是打在了人物旁边，但是这个地方所提供的信息并没有作用，而真正有作用的是人物，所以需要通过corner pool的方式将人物上的信息，转移到旁边来，以便模型在预测点的时候能更准确。
在这里插入图片描述

ECCV-2018最佼佼者的目标检测算法
在这里插入图片描述

bryant_meng

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【CornerNet】《CornerNet: Detecting Objects as Paired Keypoints》

ECCV-2018文章目录1 Background and Motivation2 Advantages / Contributions4 Method5 Datasets6 Experiments7 Conclusion / Future work1 Background and Motivationanchor-based（原文中是 anchor box） method 的 dr...
复制链接

扫一扫