数学公式推导_归一化指数函数_softmax
核心
S
(
y
i
)
=
e
y
i
Σ
e
y
j
j
S\left( y_i \right) =\frac{e^{y_i}}{\underset{j\,\, }{\varSigma e^{y_j}}}
S(yi)=jΣeyjeyi
Derivative:
p
i
=
e
a
i
Σ
k
=
1
N
e
a
k
p_i=\frac{e^{a_i}}{\varSigma _{k=1}^{N}e^{a_k}}
pi=Σk=1Neakeai
∂
p
i
∂
a
i
=
∂
e
a
i
Σ
k
=
1
N
e
a
k
∂
a
j
\frac{\partial p_i}{\partial a_i}=\frac{\partial \frac{e^{a_i}}{\varSigma _{k=1}^{N}e^{a_k}}}{\partial a_j}
∂ai∂pi=∂aj∂Σk=1Neakeai
令
g
(
x
)
=
e
a
i
g\left( x \right) =e^{a_i}
g(x)=eai
h
(
x
)
=
Σ
k
=
1
N
e
a
k
h\left( x \right) =\varSigma _{k=1}^{N}e^{a_k}
h(x)=Σk=1Neak
结合复合函数的求导公式
f
′
(
x
)
=
g
′
(
x
)
h
(
x
)
−
h
′
(
x
)
g
(
x
)
h
2
(
x
)
f'\left( x \right) =\frac{g'\left( x \right) h\left( x \right) -h'\left( x \right) g\left( x \right)}{h^2\left( x \right)}
f′(x)=h2(x)g′(x)h(x)−h′(x)g(x)
分类讨论
when i 等于 j 正数
∂
e
a
i
Σ
k
=
1
N
e
a
k
∂
a
j
=
e
a
i
Σ
k
=
1
N
e
a
k
−
e
a
j
e
a
i
(
Σ
k
=
1
N
e
a
k
)
2
=
e
a
i
(
Σ
k
=
1
N
e
a
k
−
e
a
j
)
(
Σ
k
=
1
N
e
a
k
)
2
=
e
a
j
Σ
k
=
1
N
e
a
k
−
(
Σ
k
=
1
N
e
a
k
−
e
a
j
)
Σ
k
=
1
N
e
a
k
=
p
j
(
1
−
p
j
)
\frac{\partial \frac{e^{a_i}}{\varSigma _{k=1}^{N}e^{a_k}}}{\partial a_j}=\frac{e^{a_i}\varSigma _{k=1}^{N}e^{a_k}-e^{a_j}e^{a_i}}{\left( \varSigma _{k=1}^{N}e^{a_k} \right) ^2} \\ \,\, =\frac{e^{a_i}\left( \varSigma _{k=1}^{N}e^{a_k}-e^{a_j} \right)}{\left( \varSigma _{k=1}^{N}e^{a_k} \right) ^2} \\ \,\, =\frac{e^{a_j}}{\varSigma _{k=1}^{N}e^{a_k}}-\frac{\left( \varSigma _{k=1}^{N}e^{a_k}-e^{a_j} \right)}{\varSigma _{k=1}^{N}e^{a_k}} \\ =p_j\left( 1-p_j \right)
∂aj∂Σk=1Neakeai=(Σk=1Neak)2eaiΣk=1Neak−eajeai=(Σk=1Neak)2eai(Σk=1Neak−eaj)=Σk=1Neakeaj−Σk=1Neak(Σk=1Neak−eaj)=pj(1−pj)
when i 不等于 j 负数
∂
e
a
i
Σ
k
=
1
N
e
a
k
∂
a
j
=
0
×
Σ
k
=
1
N
e
a
k
−
e
a
j
e
a
i
(
Σ
k
=
1
N
e
a
k
)
2
=
e
a
j
Σ
k
=
1
N
e
a
k
×
−
e
a
j
Σ
k
=
1
N
e
a
k
=
−
p
i
p
j
\frac{\partial \frac{e^{a_i}}{\varSigma _{k=1}^{N}e^{a_k}}}{\partial a_j}=\frac{0\times \varSigma _{k=1}^{N}e^{a_k}-e^{a_j}e^{a_i}}{\left( \varSigma _{k=1}^{N}e^{a_k} \right) ^2} \\ \,\, =\frac{e^{a_j}}{\varSigma _{k=1}^{N}e^{a_k}}\times \frac{-e^{a_j}}{\varSigma _{k=1}^{N}e^{a_k}} \\ =-p_ip_j
∂aj∂Σk=1Neakeai=(Σk=1Neak)20×Σk=1Neak−eajeai=Σk=1Neakeaj×Σk=1Neak−eaj=−pipj
注:
- 吐槽一下CSDN对LaTex的兼容性,很多语法存在bug