主要观点
- 提出基于高斯分布的损失函数,以处理类内不确定性问题
- 高斯均值相当于类中心,不确定性用方差描述
- 类似于类间margin,提出类内margin
原理阐述
类样本不平衡会导致极坐标下的类分类变得困难
L-GM Loss:
L
L
−
G
M
=
−
1
N
∑
i
=
1
N
log
e
−
d
z
i
(
1
+
α
)
∑
m
M
e
−
d
m
(
1
+
R
(
m
=
z
i
)
α
)
+
λ
(
d
z
i
+
1
2
log
∣
Λ
z
i
∣
)
\begin{aligned} \mathcal{L}_{L-G M}=&-\frac{1}{N} \sum_{i=1}^{N} \log \frac{e^{-d_{z_{i}}(1+\alpha)}}{\sum_{m}^{M} e^{-d_{m}\left(1+R\left(m=z_{i}\right) \alpha\right)}} \\ &+\lambda\left(d_{z_{i}}+\frac{1}{2} \log \left|\Lambda_{z_{i}}\right|\right) \end{aligned}
LL−GM=−N1i=1∑Nlog∑mMe−dm(1+R(m=zi)α)e−dzi(1+α)+λ(dzi+21log∣Λzi∣)
其中,
d
m
=
1
2
(
x
i
−
μ
m
)
T
Λ
m
−
1
(
x
i
−
μ
m
)
,
m
∈
[
1
,
M
]
d_{m}=\frac{1}{2}\left(x_{i}-\mu_{m}\right)^{T} \Lambda_{m}^{-1}\left(x_{i}-\mu_{m}\right), m \in[1, M]
dm=21(xi−μm)TΛm−1(xi−μm),m∈[1,M]
Center Loss:
L
Center
=
−
1
N
∑
i
N
log
e
w
y
i
T
x
i
∑
k
=
0
K
e
w
k
T
x
i
+
λ
2
∑
i
N
∥
x
i
−
c
y
i
∥
2
2
\mathcal{L}_{\text {Center }}=-\frac{1}{N} \sum_{i}^{N} \log \frac{e^{w_{y_{i}}^{T} x_{i}}}{\sum_{k=0}^{K} e^{w_{k}^{T} x_{i}}}+\frac{\lambda}{2} \sum_{i}^{N}\left\|x_{i}-c_{y_{i}}\right\|_{2}^{2}
LCenter =−N1i∑Nlog∑k=0KewkTxiewyiTxi+2λi∑N∥xi−cyi∥22
ICU Loss:
L
c
l
s
=
−
1
N
∑
i
=
1
N
log
p
(
z
i
∣
x
i
)
=
−
1
N
∑
i
=
1
N
log
e
−
d
z
i
Σ
k
K
e
−
d
k
+
λ
L
r
e
g
\begin{aligned} \mathcal{L}_{c l s} &=-\frac{1}{N} \sum_{i=1}^{N} \log p\left(z_{i} \mid x_{i}\right) \\ &=-\frac{1}{N} \sum_{i=1}^{N} \log \frac{e^{-d_{z_{i}}}}{\Sigma_{k}^{K} e^{-d_{k}}}+\lambda \mathcal{L}_{r e g} \end{aligned}
Lcls=−N1i=1∑Nlogp(zi∣xi)=−N1i=1∑NlogΣkKe−dke−dzi+λLreg
其中,
d
k
=
1
2
[
(
x
i
−
μ
k
)
T
Σ
k
−
1
(
x
i
−
μ
k
)
+
ln
∣
Σ
k
∥
,
k
∈
[
1
,
K
]
d_{k}=\frac{1}{2}\left[\left(x_{i}-\mu_{k}\right)^{T} \Sigma_{k}^{-1}\left(x_{i}-\mu_{k}\right)+\ln \mid \Sigma_{k} \|, k \in[1, K]\right.
dk=21[(xi−μk)TΣk−1(xi−μk)+ln∣Σk∥,k∈[1,K]
λ
L
reg
=
∑
k
=
1
K
λ
1
∣
μ
k
−
μ
ˉ
N
k
∣
2
+
λ
2
∣
σ
k
2
−
σ
ˉ
N
k
2
∣
2
\lambda \mathcal{L}_{\text {reg }}=\sum_{k=1}^{K} \lambda_{1}\left|\mu_{k}-\bar{\mu}_{N_{k}}\right|^{2}+\lambda_{2}\left|\sigma_{k}^{2}-\bar{\sigma}_{N_{k}}^{2}\right|^{2}
λLreg =k=1∑Kλ1∣μk−μˉNk∣2+λ2∣∣σk2−σˉNk2∣∣2
ICU Loss相当于Center Loss + L-GM Loss + 方差规整
现在再加上类内外的margin项,得到:
L
I
C
U
=
−
1
N
∑
i
=
1
N
log
e
−
d
z
i
(
1
+
α
)
Σ
k
,
k
≠
z
i
K
e
−
d
k
+
e
−
d
z
i
(
1
+
α
)
\mathcal{L}_{I C U}=-\frac{1}{N} \sum_{i=1}^{N} \log \frac{e^{-d_{z_{i}}(1+\alpha)}}{\Sigma_{k, k \neq z_{i}}^{K} e^{-d_{k}}+e^{-d_{z_{i}}(1+\alpha)}}
LICU=−N1i=1∑NlogΣk,k=ziKe−dk+e−dzi(1+α)e−dzi(1+α)
d
z
i
=
1
2
[
(
x
i
−
μ
z
i
)
T
Σ
z
i
−
1
(
x
i
−
μ
z
i
)
+
ln
(
1
+
γ
)
∣
Σ
z
i
∣
]
d_{z_{i}}=\frac{1}{2}\left[\left(x_{i}-\mu_{z_{i}}\right)^{T} \Sigma_{z_{i}}^{-1}\left(x_{i}-\mu_{z_{i}}\right)+\ln (1+\gamma)\left|\Sigma_{z_{i}}\right|\right]
dzi=21[(xi−μzi)TΣzi−1(xi−μzi)+ln(1+γ)∣Σzi∣]
d
k
=
1
2
[
(
x
i
−
μ
k
)
T
Σ
k
−
1
(
x
i
−
μ
k
)
+
ln
∣
Σ
k
∣
]
,
k
∈
[
1
,
K
]
d_{k}=\frac{1}{2}\left[\left(x_{i}-\mu_{k}\right)^{T} \Sigma_{k}^{-1}\left(x_{i}-\mu_{k}\right)+\ln \left|\Sigma_{k}\right|\right], k \in[1, K]
dk=21[(xi−μk)TΣk−1(xi−μk)+ln∣Σk∣],k∈[1,K]
其中,
α
\alpha
α控制类外margin,
γ
\gamma
γ控制类内margin