softmax + cross_entropy 求导
l
o
s
s
=
−
∑
j
=
1
3
y
j
∗
l
o
g
(
p
j
)
loss = - \sum_{j=1}^3y_j*log(p_j)
loss=−j=1∑3yj∗log(pj)
其中
y
=
[
1
,
0
,
0
]
,
p
i
=
e
z
i
∑
k
=
1
3
e
z
k
y = [1, 0, 0],p_i = \frac{e^{z_i}}{\sum_{k=1}^3e^{z_k}}
y=[1,0,0],pi=∑k=13ezkezi,所以:
l
o
s
s
=
−
y
1
∗
l
o
g
(
p
1
)
\begin{aligned} loss = -y_1*log(p_1) \end{aligned}
loss=−y1∗log(p1)
φ
l
o
s
s
φ
z
1
=
φ
l
o
s
s
φ
p
1
∗
φ
p
1
φ
z
1
=
−
1
p
1
∗
φ
(
e
z
1
∑
k
=
1
3
e
z
k
)
φ
z
1
=
−
1
p
1
[
e
z
1
∗
(
−
1
)
∗
1
(
∑
k
=
1
3
e
z
k
)
2
∗
e
z
1
+
e
z
1
∑
k
=
1
3
e
z
k
]
=
−
1
p
1
[
(
−
1
)
∗
(
e
z
1
∑
k
=
1
3
e
z
k
)
2
+
e
z
1
∑
k
=
1
3
e
z
k
]
=
−
1
p
1
[
(
−
1
)
∗
p
1
2
+
p
1
]
=
p
1
−
1
\begin{aligned} \frac{\varphi loss}{\varphi z_1} & = \frac{\varphi loss}{\varphi p_1} * \frac{\varphi p_1}{\varphi z_1} \\ & =-\frac{1}{p_1} * \frac{\varphi (\frac{e^{z_1}}{\sum_{k=1}^3e^{z_k}})}{\varphi z_1} \\ & = -\frac{1}{p_1} [e^{z_1} * (-1)*\frac{1}{(\sum_{k=1}^3e^{z_k})^2} * e^{z_1} +\frac{e^{z_1}}{\sum_{k=1}^3e^{z_k}} ] \\ & = -\frac{1}{p_1} [(-1) *( \frac{e^{z_1}}{\sum_{k=1}^3e^{z_k}})^2 + \frac{e^{z_1}}{\sum_{k=1}^3e^{z_k}}] \\ & = -\frac{1}{p_1}[(-1)*p_1^2 + p1] \\ & = p_1 - 1 \end{aligned}
φz1φloss=φp1φloss∗φz1φp1=−p11∗φz1φ(∑k=13ezkez1)=−p11[ez1∗(−1)∗(∑k=13ezk)21∗ez1+∑k=13ezkez1]=−p11[(−1)∗(∑k=13ezkez1)2+∑k=13ezkez1]=−p11[(−1)∗p12+p1]=p1−1
φ
l
o
s
s
φ
z
2
=
φ
l
o
s
s
φ
p
1
∗
φ
p
1
φ
z
2
=
−
1
p
1
∗
φ
(
e
z
1
∑
k
=
1
3
e
z
k
)
φ
z
2
=
−
1
p
1
(
(
−
1
)
∗
e
z
1
(
∑
k
=
1
3
e
z
k
)
2
∗
e
z
2
)
=
−
1
p
1
(
(
−
1
)
∗
e
z
1
∑
k
=
1
3
e
z
k
e
z
2
∑
k
=
1
3
e
z
k
)
=
−
1
p
1
(
(
−
1
)
∗
p
1
∗
p
2
)
=
p
2
\begin{aligned} \frac{\varphi loss}{\varphi z_2} & = \frac{\varphi loss}{\varphi p_1} * \frac{\varphi p_1}{\varphi z_2} \\ & =-\frac{1}{p_1} * \frac{\varphi (\frac{e^{z_1}}{\sum_{k=1}^3e^{z_k}})}{\varphi z_2} \\ & = -\frac{1}{p_1} ((-1) * \frac{e^{z_1}}{(\sum_{k=1}^3e^{z_k})^2} * e^{z_2}) \\ & = -\frac{1}{p_1}((-1) * \frac{e^{z_1}}{\sum_{k=1}^3e^{z_k}} \frac{e^{z_2}}{\sum_{k=1}^3e^{z_k}}) \\ & = -\frac{1}{p_1}((-1) * p_1 * p_2) \\ & = p_2 \end{aligned}
φz2φloss=φp1φloss∗φz2φp1=−p11∗φz2φ(∑k=13ezkez1)=−p11((−1)∗(∑k=13ezk)2ez1∗ez2)=−p11((−1)∗∑k=13ezkez1∑k=13ezkez2)=−p11((−1)∗p1∗p2)=p2
同理可以计算得到:
φ
l
o
s
s
φ
z
3
=
p
3
\frac{\varphi loss}{\varphi z_3} = p_3
φz3φloss=p3
总结得到:
φ
L
o
s
s
φ
z
i
=
{
p
i
−
1
,
i = y
p
i
,
i
≠
y
\frac{\varphi Loss}{\varphi z_i} = \begin{cases} p_i - 1, & \text {i = y} \\ p_i , & \text{i $\neq$y} \end{cases}
φziφLoss={pi−1,pi,i = yi =y
当y = [1, 0]
sigmoid + binary_cross_entroy:
l
o
s
s
=
−
∑
j
=
1
3
[
y
j
l
o
g
(
p
j
)
+
(
1
−
y
j
)
l
o
g
(
1
−
p
j
)
]
loss = - \sum_{j=1}^3[y_j log(p_j) + (1 - y_j)log(1-p_j)]
loss=−j=1∑3[yjlog(pj)+(1−yj)log(1−pj)]
其中
y
=
[
1
,
0
,
0
]
,
p
i
=
1
1
+
e
z
i
,
φ
p
i
φ
z
i
=
p
i
∗
(
1
−
p
i
)
y = [1, 0, 0],p_i = \frac{1}{1+e^{z_i}}, \frac{\varphi p_i}{\varphi z_i} = p_i*(1-p_i)
y=[1,0,0],pi=1+ezi1,φziφpi=pi∗(1−pi),所以:
φ
l
o
s
s
φ
z
i
=
φ
l
o
s
s
φ
p
i
∗
φ
p
i
φ
z
1
=
−
(
y
i
p
i
+
(
−
1
)
∗
1
−
y
i
1
−
p
i
)
∗
p
i
∗
(
1
−
p
i
)
\begin{aligned} \frac{\varphi loss}{\varphi z_i} & = \frac{\varphi loss}{\varphi p_i} * \frac{\varphi p_i}{\varphi z_1} \\ & =-(\frac{y_i}{p_i} +(-1)* \frac{1-y_i}{1 - p_i}) * p_i*(1-p_i) \end{aligned}
φziφloss=φpiφloss∗φz1φpi=−(piyi+(−1)∗1−pi1−yi)∗pi∗(1−pi)
当y_i = 1时,即目标概率:
φ
l
o
s
s
φ
z
i
=
−
1
p
i
∗
p
i
∗
(
1
−
p
i
)
=
p
i
−
1
\begin{aligned} \frac{\varphi loss}{\varphi z_i} & =-\frac{1}{p_i} * p_i * (1-p_i) \\ & = p_i -1 \end{aligned}
φziφloss=−pi1∗pi∗(1−pi)=pi−1
当y_i = 0时,即非目标概率:
φ
l
o
s
s
φ
z
i
=
1
1
−
p
i
∗
p
i
∗
(
1
−
p
i
)
=
p
i
\begin{aligned} \frac{\varphi loss}{\varphi z_i} & =\frac{1}{1-p_i} * p_i * (1-p_i) \\ & = p_i \end{aligned}
φziφloss=1−pi1∗pi∗(1−pi)=pi
总结得到:
φ
L
o
s
s
φ
z
i
=
{
p
i
−
1
,
i = y
p
i
,
i
≠
y
\frac{\varphi Loss}{\varphi z_i} = \begin{cases} p_i - 1, & \text {i = y} \\ p_i , & \text{i $\neq$y} \end{cases}
φziφLoss={pi−1,pi,i = yi =y
当y = [1, 0]