[paper reading] CenterNet (Triplets)
GitHub:Notes of Classic Detection Papers
2020.11.09更新:更新了Use Yourself,即对于本文的理解和想法,详情参见GitHub:Notes of Classic Detection Papers
本来想放到GitHub的,结果GitHub不支持公式。
没办法只能放到CSDN,但是格式也有些乱
强烈建议去GitHub上下载源文件,来阅读学习!!!这样阅读体验才是最好的
当然,如果有用,希望能给个star!
topic | motivation | technique | key element | math | use yourself | relativity |
---|---|---|---|---|---|---|
CenterNet (triple) | Problem to Solve Idea Intuition | CenterNet Architecture Center Pooling Cascade Corner Pooling Central Region Exploration | Baseline:CornerNet Generating BBox Training Inferencing Ablation Experiment Error Analysis Metric AP & AR & FD Small & Medium & Large | Central Region Loss Function | …… | Related Work |
文章目录
Motivation
Problem to Solve
keypoint-based方法的弊端(这里主要指的是CornerNet):
由于缺少对于cropped region的additional look,无法获得bounding box region的visual pattern,会导致产生大量的incorrect bounding box
Idea
用一个keypoint triplet(top-left corner & bottom-right corner & center)表示一个object。
即在由top-left corner & bottom-right corner去encode边界信息的同时,通过引入center,使得模型可以explore每个predicted bounding box的visual patter(获得object的internal信息)
在具体的做法上,是将 visual patterns within object 转化成 keypoint detection
Intuition
该思路部分沿袭RoI Pooling的思想,通过efficient discrimination(Central Region),使得one-stage方法一定程度上具有了two-stage方法的resample能力
具体来说:如果predicted bounding box和ground-truth box有高IoU,则Center-Region中的Center KeyPoint也会被预测为相同的类别
Technique
CenterNet Architecture
Components
- [Center Pooling](#Center Pooling)
- [Cascade Corner Pooling](#Cascade Corner Pooling)
- [Central Region Exploration](#Central Region Exploration)
Improvement
-
AP Improvement
small、medium、large object的AP均有提升,绝大部分的提升来自small object
原因:Center Information。incorrect bounding box越小,能在其Central Region检测到center keypoint的可能性越小
small object
-
AR Improvement
原因:滤除了incorrect bounding box,相当于提升了accurate location but lower scores的bounding box的confidence
Center Pooling
Cascade Corner Pooling 和 Center Pooling 都可以通过结合不同方向的 Corner Pooling 实现
Why
geometric center并不一定带有recognizable visual pattern
Purpose
better detection of center keypoint!!!
具体来说,是为Central Region提供recognizable visual pattern,以感知proposal中心位置的信息,从而检测bounding box的正确性
Steps
对于Center Pooling的输入feature map,在水平和垂直方向取max summed response
- backbone输出feature map
- 在水平和垂直方向分别找到最大值
- 将其加到一起
Cascade Corner Pooling
Cascade Corner Pooling 和 Center Pooling 都可以通过结合不同方向的 Corner Pooling 实现
Why
corner在object之外,缺少local appearance feature
Purpose
better detection of corners!!!
具体来说,是丰富top-left corner和bottom-right corner收集的信息,以同时感知boundary和internal信息
Steps
在输入feature map的boundary和internal方向,去max summed response(双方向的pooling更稳定更鲁棒,能提高准确率和召回率)
- 在boundary方向上找boundary max
- 在boundary max的位置,向internal方向上找internal max
- 把2个max加起来(加到corner的位置)
Central Region Exploration
Scale-Aware Central Region
-
原因:
recall v s . precision \text{recall} \ vs. \text{precision} recall vs.precision
-
Central Region的选择:
对不同size的bounding box生成不同大小Central Region
-
small bounding box ==> large central region
原因:small center region会导致small bounding box的low recall
-
large bounding box ==> small central region
原因:small center region会导致small bounding box的low recall
在实验中,使用2中Central Region:
具体使用哪种,由bounding box的scale决定:
- < 150 < 150 <150:n = 3 (left)
- > 150 > 150 >150:n = 5 (right)
-
Exploration
- center keypoint落到Central Region中
- center keypoint和bounding box的类别相同
Key Element
Baseline:CornerNet
Three outputs
-
heatmap:
- top-left corner
- bottom-right corner
每个heatmap都包括2个部分:
- 不同category的keypoint的位置
- 每个keypoint的confidence score
-
embedding:
对corner进行分组
-
offset:
把corner从heatmap去remap到input image
Generate BBox
- 对top-left corner和bottom-right corner分别取top-100
- 根据embedding distance对corner进行分组(embedding distance < T h r e s h o l d Threshold Threshold)
- 计算bounding box的confidence score(2个corner score的平均)
Drawbacks
CornerNet的False Discovery Rate(FD)很高(即:有大量的incorrect bounding box)
AP & FD的含义,见 [Metric AP & AR & FD](#Metric AP & AR & FD)
Generating BBox
-
选取 top-k 个center keypoints
-
center keypoint去remap到input image(使用offset)
-
在bounding box中定义Central Region
-
保留符合要求的bounding box
- center keypoint落到Central Region中
- center keypoint和bounding box的类别相同
-
计算bounding box的score
为top-left corner、bottom-right corner、center的average score
Training
Input & Output Size
- input size:511×511
- output size:128×128
Data Augmentation
同 CornerNet
Inferencing
Single-Scale Testing
以原分辨率,将original和flipped输入网络
Multi-Scale Testing
以分辨率 [ 0.6 , 1.0 , 1.2 , 1.5 , 1.8 ] [0.6, 1.0, 1.2,1.5,1.8] [0.6,1.0,1.2,1.5,1.8],将original和flipped输入网络
Steps
-
根据70对Triplet确定70对bounding box
详见 [Generating BBox](#Generating BBox)
-
将flipped image再次flip,合并到原image上
-
Post-Processing:Soft-NMS
-
取top-100的bounding box
Ablation Experiment
Incorrect Bounding Box Reduction
Inference Speed
visual patterns exploration的cost很小
CenterNet某版本可以在精度和速度上同时超过CornerNet某版本
Center Pooling Ablation
-
结论:
Center Pooling可以大幅度提高large object的AP
-
原因:
- Center Pooling可以提取更丰富的internal visual patterns
- larger object包含更多的internal visual pattern
Cascade Corner Pooling Ablation
-
结论:
-
由于large object有丰富的internal visual patterns,Cascade Corner Pooling可以看到更多的object
-
过于丰富的internal visual patterns会影响其对boundary的敏感,导致inaccurate bounding box
- 可以通过Center Pooling抑制错误的Bounding box
-
Central Region Exploration Ablation
-
结论:
提升了整体的AP,其中小目标AP提升最大
-
原因:
小目标的center keypoint更容易被located
Error Analysis
-
Exploration of visual patterns依赖于center keypoint实现 ==> Center keypoint的丢失会导致CenterNet丢失bounding box的visual pattern
-
Center keypoint还有很大的提升空间
Metric AP & AR & FD
AP:Average Precision Rate
是在所有category上,以10个Threshold(e.g. 0.5 : 0.05 : 0.95 0.5:0.05:0.95 0.5:0.05:0.95)上计算
可以反映网络可以预测多少高质量的bounding box(一般IoU ≥ 0.5 \ge0.5 ≥0.5)
是MS-COCO数据集最重要的metric
AR:Maximum Recall Rate
在每张图片上取固定数量的detection,在所有类别和10个IoU Threshold上取平均
FD:False Discovery Rate
反映incorrect bounding box的比例
FD
=
1
−
AP
\text{FD} = 1-\text{AP}
FD=1−AP
Small & Medium & Large
-
small object: area < 3 2 2 \text{area}<32^2 area<322
-
medium object: 3 2 2 < area < 9 6 2 32^2<\text{area}<96^2 322<area<962
-
large object: area > 9 6 2 \text{area}>96^2 area>962
Math
Central Region
Loss Function
主要分为:
-
Detection Loss
- Corner Detection Loss L det co \text{L}_{\text{det}}^{\text{co}} Ldetco
- Center Detection Loss L det ce \text{L}_{\text{det}}^{\text{ce}} Ldetce
-
Pull & Push Loss
仅对Corner进行
- Pull Loss L pull co \text{L}_{\text{pull}}^{\text{co}} Lpullco
- Push Loss L push co \text{L}_{\text{push}}^{\text{co}} Lpushco
-
Offset Loss
- Corner offset Loss L off co \text{L}_{\text{off}}^{\text{co}} Loffco
- Center offset Loss L off ce \text{L}_{\text{off}}^{\text{ce}} Loffce
- α = β = 0.1 \alpha=\beta = 0.1 α=β=0.1
- γ = 1 \gamma=1 γ=1
Use Yourself
……
Related Work
Anchor-Based Method
Introduction
Anchor-Based Method有2个关键点:
- 放置预定义size和ratio的anchor
- 根据ground-truth对positive bounding box进行regression
drawbacks
-
需要大量的anchor(以保持和ground-truth box的足够高的IoU)
-
anchor的size和ratio需要手工设计(带来大量的超参数需要调试)
-
anchor和ground-truth没有对齐
KeyPoint-Based Method
这里主要指的是CornerNet
Introduction
即:使用一对corner表示一个object
drawbacks
-
referring到global信息的能力相对较弱
换句话说,即:对object的boundary信息敏感
-
无法确知哪对KeyPoints应该表示object
详见 [Problem to Solve](#Problem to Solve)
Two-Stage Method
Steps
- Extract RoIs ==> stage-1
- classify & regress RoIs ==> stage-2
Models
RCNN:
- selective search获得RoI
- CNN作为classifier
SPP-Net & Faster-RCNN:
- 在feature map中提取RoIs
Faster-RCNN:
- 使用RPN对anchor进行regression,实现了end-to-end训练
Mask-RCNN:
- Faster-RCNN + mask-prediction branch
- 同时实现detection和segmentation
R-FCN:
- 将FC层替换成了position-sensitive score maps
Cascade RCNN:
通过训练一系列IoU阈值逐渐升高的detector,解决了2个问题:
- 训练时的overfitting
- 推断时的quality mismatch
One-stage Method
one-stage方法的通病:缺少对cropped region的additional look
Steps
直接对anchor box进行classify和regress
Models
YOLOv1:
- image ==> S×S grid
- 不使用anchor,直接去学习bounding box的size
YOLOv2:
- 重新使用了较多的anchor
- 使用了新的bounding box regression方法
SSD:
- 使用不同convolutional stage的feature map进行classify和regress
DSSD:
- SSD + deconvolution ==> 结合low-level和high-level的feature
R-SSD:
- 对不同feature layer,进行pooling和deconvolution ==> 结合low-level和high-level的feature
RON:
- reverse connection
- objectness prior
RefineDet:
- 对location和size进行2次refine,继承了one-stage和two-stage的优点
CornerNet:
- keypoint-based method
- 用一对corner表示一个object
Problems
- Cascade Corner Pooling的internal方向,怎么找boundary方向的最大值呢?
- AP和AR的含义到底是什么?
- 为什么CornerNet去referring目标的global information的能力很弱?