从交叉熵讲,
L o s s = L ( y , p ^ ) = − y l o g ( p ^ ) − ( 1 − y ) l o g ( 1 − p ^ ) Loss = L(y, \hat{p})=-ylog(\hat{p})-(1-y)log(1-\hat{p}) Loss=L(y,p^)=−ylog(p^)−(1−y)log(1−p^)
以二分类问题为例,
L = 1 N ( ∑ y i = 1 m − l o g ( p ^ ) + ∑ y i = 0 n − l o g ( 1 − p ^ ) ) L=\frac{1}{N}(\sum_{y_i =1}^m -log(\hat{p})+\sum_{y_i=0}^{n}-log(1-\hat{p})) L=N1(yi=1∑m−log(p^)+yi=0∑n−log(1−p^))
正负样本个数不均,导致训练反馈偏重于样本多的类别。
采用加权,平衡交叉熵
L = 1 N ( ∑ y i = 1 m − α l o g ( p ^ ) + ∑ y i = 0 n − ( 1 − α ) l o g ( 1 − p ^ ) ) L=\frac{1}{N}(\sum_{y_i =1}^m -\alpha log(\hat{p})+\sum_{y_i=0}^{n}-(1-\alpha)log(1-\hat{p})) L=N1(yi=1∑m−αlog(p^)+yi=0∑n−(1−α)log(1−p^))
α 1 − α = n m \frac {\alpha}{1-\alpha} = \frac {n}{m} 1−αα=mn
Focal Loss
L
f
l
=
{
−
(
1
−
p
^
)
γ
l
o
g
(
p
^
)
i
f
y
=
1
−
p
^
γ
l
o
g
(
1
−
p
^
)
i
f
y
=
0
L_{fl} = \begin{cases} -(1-\hat{p})^\gamma log(\hat{p}) &if&y=1 \\ -\hat{p}^\gamma log(1-\hat{p}) & if & y=0 \end{cases}
Lfl={−(1−p^)γlog(p^)−p^γlog(1−p^)ifify=1y=0
或写成
p
t
=
{
p
^
i
f
y
=
1
1
−
p
^
o
t
h
e
r
w
i
s
e
L
f
l
=
−
(
1
−
p
t
)
γ
l
o
g
(
p
t
)
p_t=\begin{cases} \hat{p} & if & y=1 \\ 1-\hat{p} & otherwise \end{cases} \\ L_{fl}=-(1-p_t)^\gamma log(p_t)
pt={p^1−p^ifotherwisey=1Lfl=−(1−pt)γlog(pt)
Focal loss从分类难易程度加权loss,使得loss聚焦于难分样本。
样本少的类别较难分类,Focal loss有助于提高样本少类别的准确率;但难分样本不局限于样本数少的类别。