Label Smoothing由(Christian Szeged et al., 2015)是为了防止训练过拟合而提出。
提出原因:
one-hot encoding(独热编码):
在分类问题中,常常通过softmax将输出向量转化为独热编码,即正类为1,其他为0。
对于N分类问题,每一类对应一个N维向量:
#label及其所对应的one-hot编码对应如下:
label=[0,1,2,3,4,5,6]
one_hot_encode = [[1,0,0,0,0,0,0],
[0,1,0,0,0,0,0],
[0,0,1,0,0,0,0],
[0,0,0,1,0,0,0],
[0,0,0,0,1,0,0],
[0,0,0,0,0,1,0],
[0,0,0,0,0,0,1]]
缺点:
如交叉熵损失函数
l
o
s
s
=
−
∑
k
=
1
K
q
(
k
/
x
)
log
(
p
(
k
/
x
)
)
loss=-\displaystyle\sum_{k=1}^Kq(k/x)\log(p(k/x))
loss=−k=1∑Kq(k/x)log(p(k/x))
若分类越准确,loss值越接近0,否则越趋近负无穷。然而我们的标注不一定是完全准确的,因此如果使用独热编码会导致使用交叉熵学习的目标函数不一定达到最优,反而可能过拟合。
LabelSmoothing:
真实概率分布变化:
ϵ
\epsilon
ϵ常取0.1
def label_smoothing(inputs, eps=0.1):
K = inputs.size(-1) # number of class
return (1-eps) * inputs + eps / (K-1)
#one-hot编码对应改为:
[[0.9998,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002],
[0.0002,0.9998,0.0002,0.0002,0.0002,0.0002,0.0002],
[0.0002,0.0002,0.9998,0.0002,0.0002,0.0002,0.0002],
[0.0002,0.0002,0.0002,0.9998,0.0002,0.0002,0.0002],
[0.0002,0.0002,0.0002,0.0002,0.9998,0.0002,0.0002],
[0.0002,0.0002,0.0002,0.0002,0.0002,0.9998,0.0002],
[0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.9998]]
参考:
标签平滑Label Smoothing [CSDN]
Label smooth [CSDN]
机器学习中用来防止过拟合的方法有哪些? [简书]