softmax函数简写为
S
i
=
e
i
∑
j
e
j
S_i=\frac{e^i} {\sum_j e^j}
Si=∑jejei
交叉熵损失函数
L
=
−
∑
k
t
k
l
n
(
y
=
k
)
L=-\sum_k t_kln(y=k)
L=−∑ktkln(y=k)。其中目标类的
t
k
t_k
tk为1,其余类的
t
k
t_k
tk为0.
当预测为第i个时,可以认为
t
i
=
1
t_i=1
ti=1,
y
i
y_i
yi表示求出的softmax值。
此时损失函数变成了:
L
o
s
s
i
=
−
l
n
y
i
Loss_i=-lny_i
Lossi=−lnyi。
根据定义:
y
i
=
e
i
∑
j
e
j
y_i=\frac{e^i}{\sum_je^j}
yi=∑jejei。我们已经将数值映射到了0-1之间,并且和为1,则有:
e
i
∑
j
e
j
=
1
−
∑
j
≠
i
e
j
∑
j
e
j
\frac{e^i}{\sum_je^j}=1-\frac{\sum_{j\neq i}e^j}{\sum_je^j}
∑jejei=1−∑jej∑j=iej
接下来对Loss求导:
∂
L
o
s
s
i
∂
i
=
−
∂
l
n
y
i
∂
i
=
∂
(
−
l
n
e
i
∑
j
e
J
)
∂
i
\frac{\partial Loss_i}{\partial_i}=-\frac{\partial lny_i}{\partial_i}=\frac{\partial(-ln\frac{e^i}{\sum_j e^J})}{\partial_i}
∂i∂Lossi=−∂i∂lnyi=∂i∂(−ln∑jeJei)
=
−
1
e
i
∑
j
e
J
⋅
∂
(
e
i
∑
j
e
J
)
∂
i
=-\frac{1}{\frac{e^i}{\sum_j e^J}} \cdot \frac{\partial (\frac{e^i}{\sum_j e^J})}{\partial_i}
=−∑jeJei1⋅∂i∂(∑jeJei)
=
−
∑
j
e
j
e
i
⋅
∂
(
1
−
∑
j
≠
i
e
j
∑
j
e
j
)
∂
i
=-\frac{\sum_j e^j}{e^i}\cdot \frac{\partial(1-\frac{\sum_{j\neq i}e^j}{\sum_je^j})}{\partial_i}
=−ei∑jej⋅∂i∂(1−∑jej∑j=iej)
=
−
∑
j
e
j
e
i
⋅
(
−
∑
j
≠
i
e
j
)
⋅
∂
(
1
∑
j
e
j
)
∂
i
=-\frac{\sum_j e^j}{e^i}\cdot(-\sum_{j\neq i}e^j)\cdot\frac{\partial(\frac{1}{\sum_je^j})}{\partial_i}
=−ei∑jej⋅(−j=i∑ej)⋅∂i∂(∑jej1)
=
∑
j
e
j
⋅
∑
j
≠
i
e
j
)
e
i
⋅
(
−
1
)
⋅
e
i
(
∑
j
e
j
)
2
=\frac{\sum_je^j\cdot\sum_{j\neq i}e^j)}{e^i}\cdot(-1)\cdot\frac{e^i}{(\sum_je^j)^2}
=ei∑jej⋅∑j=iej)⋅(−1)⋅(∑jej)2ei
=
−
∑
j
≠
i
e
j
∑
j
e
j
=-\frac{\sum_{j\neq i}e^j}{\sum_je^j}
=−∑jej∑j=iej
=
e
i
∑
j
e
j
−
1
=\frac{e^i}{\sum_je^j}-1
=∑jejei−1
=
y
i
−
1
=y_i - 1
=yi−1
所以我们只需要正向求出
y
i
y_i
yi,将结果减1就是反向更新的梯度。
softmax函数求导
最新推荐文章于 2023-12-25 23:59:36 发布