论文笔记《SSD: Single Shot MultiBox Detector》

2017/2/15 first reading

Abstract

discretizes the output space of bounding box in to a set of default boxes over different scale and aspect ratio per feature map location.
At prediction, gen scores each catagory in each default box and produce adjustments to the box to better match the object shape
combines prediction from multiple feature maps with different resolutions to naturelly handle objects in various size.

Introduction

not resample pixels or features for bounding box
improvements:
(1)using a small convolutional filter to predict object categories and offsets in bounding box locations
(2)using separate predictors (filters) for different aspect ratio detections, and applying these filters to multiple feature maps from the later stages of a network in order to perform detection at multiple scales
(3)The core of SSD is predicting category scores and box offsets for a fixed set of default bounding boxes using small convolutional filters applied to feature maps.
(4)To achieve high detection accuracy we produce predictions of different scales from feature maps of different scales, and explicitly separate predictions by aspect ratio

2 The Single Shot Detector(SSD)

2.1 Model

Multi-scale feature maps for detection

Multi-scale feature maps for detection
convolutional feature layers decrease in size progressively and allow predictions of detections at multiple scales.

Convolutional predictors for detection
each feature layer can produce a fixed set of detection predictions using a set of convolutional filters.

Default boxes and aspact ratios

Discretize(离散化) the box space densely(密集的)

a set of default bounding boxes with each feature map cell(特征), the default boxes tile(卷积) the features map, so the position of each box relative to its corresponding cell is fixed.
feature map cell
->predict the offsets relative to the default box shapes in the cell,
->predict the per-class scores that indicate the presence of a class instance in each of those boxes.

2.2 Training

ground truth information needs to be assigned to specific outputs in the fixed set of detector output.
choosing the set of default boxes and scales
the hard negative mining(?) and data augmentation(数据增强) strategies
Matching strategy
match between default boxes and grounding truth
select from default boxes that vary over location, aspect ratio, and scale
begin by matching each ground truth boxe to the default box with the best jaccard overlap(as in MultiBox[7])(?),and we then match default boxes to any ground with jaccard overlap higher than a threshold(0.5)
that allowing the network to predict high scores for multiple overlapping default boxes rather than the max overlap one.
Training objective
The SSD training objective is derived from multibox objective but is extended to handle multiple object categories(多类).
xpij = {0,1} is a indicater the i-th default box to the j-th ground truth box of category p
xpij 是指当在类别p中时,第i个默认框与第j个真实框的匹配程度

ixpij1

总的目标损失函数如下:分别由两部分构成localization loss(loc) and the confindence loss(cof)
L(x,c,l,g)=1N(Lconf(x,c)+αLloc(x,l,g)
localization loss(loc) is a smooth L1 loss between the predicted box(l) and the ground truth box(g)
The confidence loss is the softmax loss over multiple classes confidences (c).
Choosing scales and aspect ratios for default boxes
we use both the lower and upper feature maps for detection

tiling of default boxes so that specific feature maps learn to be responsive to particular scales of the objects
default boxes的大小是根据feature map的大小而变化的
and, per map location resulting in 6 different scale default boxes

For example, in Fig. 1, the dog is matched to a default box in the 4×4 feature map, but not to any default boxes in the 8×8 feature map. This is because those boxes have different scales and do not match the dog box, and therefore are considered as negatives during training.

Hard negative mining
负样本数量远远大于正样本
Instead of using all the negative examples, we sort them using the highest confidence loss for each default box and pick the top ones so that the ratio between the negatives and positives is at most 3:1. We found that this leads to faster optimization and a more stable training.

Data augmentation
通过数据的变化来使模型更加稳健
keep the overlapped part of the ground truth box if the center of it is in the sampled patch(?)

3 Experiment Results

VGG16 pre-traind

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值