CornerNet: Detecting Objects as Paired Keypoints

We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network

Drawbacks of Anchors boxes

  1. A very large set of anchor boxes lead to huge imbalance between positive and negative

  2. how many boxes, what sizes, and what aspect ratios

Overview


We detect an object as a pair of keypoints—the top-left corner and bottom-right corner of the bounding box. We use a single convolutional network to predict a heatmap for the top-left corners of all instances of the same object category, a heatmap for all bottom-right corners, and an embedding vector for each detected corner. The embeddings serve to group a pair of corners that belong to the same object

keypoint detect and keypoint group

Three main problem:

  1. How to detect keypoint?
  2. How to group keypoint?
  3. A corner of a bounding box is often outside the object, How to improve the performens?

1201067-20180820171017356-876848328.png

Detecting Corners


Backbone: Hourglass network or other networks for human pose estimation, in this paper is Hourglass.

Output: Two sets of heatmaps, one for top-left corners and one for bottom-right corners. Each set of heatmaps has C channels, where C is the number of categories

Loss: Instead of equally penalizing negative locations, we reduce the penalty given to negative locations within a radius of the positive location. We determine the radius by the size of an object by ensuring that a pair of points within the radius would generate a bounding box with at least 0.7 IoU with the ground-truth annotation

predict offset: A location \(\left ( x,y \right )\) in the image is mapped to the location \(\left ( \left [ \frac{x}{n} \right ],\left [ \frac{y}{n} \right ] \right )\) in the heatmaps, we predict location offsets to slightly adjust the corner locations before remapping them to the input resolution.

Grouping Corners


Multiple objects may appear in an image, and thus multiple top-left and bottom-right corners may be detected. We need to determine if a pair of the top-left corner and bottom-right corner is from the same bounding box.

The network predicts an embedding vector for each detected corner

if top-left and bottom-right belong to the same bounding box, the distance between their embeddings should be small, otherwise should be large.

"push" and "pull" loss

1201067-20180820190915689-442120193.png

Corner Pooling


There is often no local visual evidence for the presence of corners, we propose corner pooling to better localize the corners by encoding explicit prior knowledge. For example, top-left corner pooling

1201067-20180820191642269-1372304581.png

Finally CornerNet

1201067-20180820191819817-1303759183.png

Experiments

Effectiveness of corner pooling
1201067-20180820192240863-599172509.png
Effectiveness of Reducing penalty to negative locations
1201067-20180820192428430-197755341.png

转载于:https://www.cnblogs.com/xiongzihua/p/9506645.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值