匹配策略
Matching strategy During training we need to determine which default boxes correspond to a ground truth detection and train the network accordingly. For each ground truth box we are selecting from default boxes that vary over location, aspect ratio, and scale. We begin by matching each ground truth box to the default box with the best jaccard overlap (as in MultiBox [7]). Unlike MultiBox, we then match default boxes to any ground truth with jaccard overlap higher than a threshold (0.5). This simplifies the learning problem, allowing the network to predict high scores for multiple overlapping default boxes rather than requiring it to pick only the one with maximum overlap.
训练目的
The SSD training objective is derived from the MultiBox objective[7,8] but is extended to handle multiple object categories. Let
x
i
j
p
=
{
1
,
0
}
x_{ij}^p= \{ 1,0\}
xijp={1,0} be an indicator for matching the
i
i
i-th default box to the
j
j
j-th ground truth box of category p. In the matching strategy above, we can have
∑
i
x
i
j
p
≥
1
\sum_ix_{ij}^p\ge1
∑ixijp≥1.
The overall objective loss function is a weighted sum of the localization loss (loc) and the confidence loss (conf):
L
(
x
,
c
,
l
,
g
)
=
1
N
(
L
c
o
n
f
(
x
,
c
)
+
α
L
l
o
c
(
x
,
l
,
g
)
)
L(x,c,l,g)=\frac{1}{N}(L_{conf}(x,c)+\alpha L_{loc}(x,l,g))
L(x,c,l,g)=N1(Lconf(x,c)+αLloc(x,l,g))
SSD 损失函数由两部分组成,一部分是目标框的位置损失,另一部分是类别置信度损失。
l
,
g
l,g
l,g分别为预测框和真实框的位置参数。
where N is the number of matched default boxes. If
N
=
0
N=0
N=0, we set the loss to 0. The localization loss is a Smooth L1 loss between the predicted box
(
l
)
(l)
(l) and the ground truth box
(
g
)
(g)
(g) parameters. Similar to Faster R-CNN, we regress to offsets for the center
(
c
x
,
c
y
)
(cx,cy)
(cx,cy) of the default bounding box (d) and for its width (w) and height(h).
The confidence loss is the softmax loss over multiple classes confidences
(
c
)
(c)
(c).
α
\alpha
α 是一个加权系数,用来衡量位置损失在总损失中占的比例。
and the weight term
α
\alpha
α is set to 1 by cross validation.