CornerNet: Detecting Objects as Paired Keypoints

最新推荐文章于 2024-10-07 06:31:57 发布

天为我蓝

最新推荐文章于 2024-10-07 06:31:57 发布

阅读量82

点赞数

文章标签：人工智能

原文链接：http://www.cnblogs.com/xiongzihua/p/9506645.html

版权

We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network

Drawbacks of Anchors boxes

A very large set of anchor boxes lead to huge imbalance between positive and negative
how many boxes, what sizes, and what aspect ratios

Overview

We detect an object as a pair of keypoints—the top-left corner and bottom-right corner of the bounding box. We use a single convolutional network to predict a heatmap for the top-left corners of all instances of the same object category, a heatmap for all bottom-right corners, and an embedding vector for each detected corner. The embeddings serve to group a pair of corners that belong to the same object

keypoint detect and keypoint group

Three main problem:

How to detect keypoint?
How to group keypoint?
A corner of a bounding box is often outside the object, How to improve the performens?

Detecting Corners

Backbone: Hourglass network or other networks for human pose estimation, in this paper is Hourglass.

Output: Two sets of heatmaps, one for top-left corners and one for bottom-right corners. Each set of heatmaps has C channels, where C is the number of categories

Loss: Instead of equally penalizing negative locations, we reduce the penalty given to negative locations within a radius of the positive location. We determine the radius by the size of an object by ensuring that a pair of points within the radius would generate a bounding box with at least 0.7 IoU with the ground-truth annotation

predict offset: A location \(\left ( x,y \right )\) in the image is mapped to the location \(\left ( \left [ \frac{x}{n} \right ],\left [ \frac{y}{n} \right ] \right )\) in the heatmaps, we predict location offsets to slightly adjust the corner locations before remapping them to the input resolution.

Grouping Corners

Multiple objects may appear in an image, and thus multiple top-left and bottom-right corners may be detected. We need to determine if a pair of the top-left corner and bottom-right corner is from the same bounding box.

The network predicts an embedding vector for each detected corner

if top-left and bottom-right belong to the same bounding box, the distance between their embeddings should be small, otherwise should be large.

"push" and "pull" loss

Corner Pooling

There is often no local visual evidence for the presence of corners, we propose corner pooling to better localize the corners by encoding explicit prior knowledge. For example, top-left corner pooling