1、MSELoss求导
- y: ~ 真实值
- p:
~
预测值
M S E l o s s = 1 2 ⋅ ∑ i = 1 n ( y i − p i ) 2 p i = x i ⋅ w + b MSE_{loss} = \frac12\cdot\sum_{i=1}^n(y_i - p_i)^2 \\ p_i = x_i\cdot w + b MSEloss=21⋅i=1∑n(yi−pi)2pi=xi⋅w+b
δ l o s s δ w = 2 ⋅ 1 2 ⋅ ∑ i = 1 n ( y i − p i ) ⋅ ( − 1 ) ⋅ x i = ∑ i = 1 n ( p i − y i ) ⋅ x i \frac{\delta loss}{\delta w} = 2 \cdot \frac12 \cdot \sum_{i=1}^n(y_i - p_i) \cdot(-1)\cdot x_i = \sum_{i=1}^n(p_i - y_i)\cdot x_i δwδloss=2⋅21⋅i=1∑n(yi−pi)⋅(−1)⋅xi=i=1∑n(pi−yi)⋅xi
δ l o s s δ b = 2 ⋅ 1 2 ⋅ ∑ i = 1 n ( y i − p i ) ⋅ ( − 1 ) = ∑ i = 1 n ( p i − y i ) \frac{\delta loss}{\delta b} = 2 \cdot \frac12 \cdot \sum_{i=1}^n(y_i - p_i)\cdot(-1) = \sum_{i=1}^n(p_i - y_i) δbδloss=2⋅21⋅i=1∑n(yi−pi)⋅(−1)=i=1∑n(pi−yi) - 注:线性回归损失函数的其中一种由来是对误差进行正态分布概率建模推导而来
2、BCELoss求导
B E C l o s s = − ∑ i = 1 n [ y i ⋅ l o g ( p i ) + ( 1 − y i ) ⋅ l o g ( 1 − p i ) ] p i = s i g m o i d ( x i ) = 1 1 + e − x i δ p i δ x i = p i ⋅ ( 1 − p i ) δ l o s s δ x i = δ l o s s δ p i ⋅ δ p i δ x i = − ∑ i = 1 n ( y i ⋅ 1 p i + ( 1 − p i ) ⋅ 1 p i − 1 ) ⋅ p i ⋅ ( 1 − p i ) = ∑ i = 1 n ( p i − y i ) 所以: δ l o s s δ x i = ∑ i = 1 n ( p i − y i ) BEC_{loss} = -\sum_{i=1}^n ~[y_i\cdot log(p_i) + (1 - y_i) \cdot log(1 -p_i)]\\ p_i = sigmoid(x_i)=\frac{1}{1 + e^{-x_i}}\\ \frac {\delta p_i}{\delta x_i} = p_i \cdot (1 - p_i)\\ \frac {\delta loss}{\delta x_i} = \frac {\delta loss}{\delta p_i} \cdot \frac {\delta p_i}{\delta x_i}=-\sum_{i=1}^n ~ (y_i\cdot \frac {1}{p_i} + (1 - p_i)\cdot \frac{1}{p_i-1})\cdot p_i \cdot(1 -p_i)=\sum_{i=1}^n~ (p_i - y_i)\\ 所以:\frac {\delta loss}{\delta x_i} = \sum_{i=1}^n~ (p_i - y_i)\\ BECloss=−i=1∑n [yi⋅log(pi)+(1−yi)⋅log(1−pi)]pi=sigmoid(xi)=1+e−xi1δxiδpi=pi⋅(1−pi)δxiδloss=δpiδloss⋅δxiδpi=−i=1∑n (yi⋅pi1+(1−pi)⋅pi−11)⋅pi⋅(1−pi)=i=1∑n (pi−yi)所以:δxiδloss=i=1∑n (pi−yi)
- 注:二元交叉熵损失函数是通过伯努利0、1分布概率建模推导而来
3、CELoss求导
C
E
l
o
s
s
=
−
∑
i
n
y
i
⋅
l
o
g
p
i
p
i
=
e
z
i
∑
j
=
1
k
e
z
j
=
s
o
f
t
m
a
x
(
z
i
)
CE_{loss} = -\sum_i^n~y_i\cdot logp_i\\ p_i = \frac {e^{z_i}}{\sum_{j=1}^ke^{z_j}} = softmax(z_i)
CEloss=−i∑n yi⋅logpipi=∑j=1kezjezi=softmax(zi)
考虑以两个神经元输出为例:
l
o
s
s
=
−
(
y
1
⋅
l
o
g
p
1
+
y
2
⋅
l
o
g
p
2
)
loss = -(y_1 \cdot logp_1 + y_2 \cdot logp_2)
loss=−(y1⋅logp1+y2⋅logp2)
y
=
(
y
1
,
y
2
)
=
(
0
,
1
)
,
p
=
(
p
1
,
p
2
)
=
(
e
z
1
e
z
1
+
e
z
2
,
e
z
2
e
z
1
+
e
z
2
)
y = (y_1, y_2) = (0, 1), ~~p = (p_1, p_2) = (\frac {e^{z_1}}{e^{z_1} + e^{z_2}},~~\frac {e^{z_2}}{e^{z_1} + e^{z_2}})
y=(y1,y2)=(0,1), p=(p1,p2)=(ez1+ez2ez1, ez1+ez2ez2)
δ
l
o
s
s
δ
z
1
=
δ
l
o
s
s
δ
p
1
⋅
δ
p
1
δ
z
1
+
δ
l
o
s
s
δ
p
2
⋅
δ
p
2
δ
z
1
\frac {\delta loss}{\delta z_1} =\frac {\delta loss}{\delta p_1} \cdot \frac {\delta p_1}{\delta z_1} + \frac {\delta loss}{\delta p_2} \cdot \frac {\delta p_2}{\delta z_1}
δz1δloss=δp1δloss⋅δz1δp1+δp2δloss⋅δz1δp2
=
−
(
y
1
p
1
⋅
e
z
1
∑
−
(
e
z
1
)
2
(
∑
)
2
+
y
2
p
2
⋅
0
−
e
z
1
⋅
e
z
2
(
∑
)
2
)
=
−
(
y
1
p
1
⋅
(
p
1
−
p
1
2
)
+
y
2
p
2
⋅
(
−
p
1
⋅
p
2
)
)
=
−
(
y
1
−
p
1
(
y
1
+
y
2
)
)
=
−
(
y
1
−
p
1
)
= -(\frac{y_1}{p_1}\cdot \frac{e^{z_1}\sum - (e^{z_1})^2}{(\sum)^2} + \frac{y_2}{p_2}\cdot \frac{0 - e^{z_1}\cdot e^{z_2}}{(\sum)^2}) \\= -(\frac {y_1}{p_1} \cdot (p_1 - p_1^2) + \frac {y_2}{p_2} \cdot (-p_1 \cdot p_2)) \\= -(y_1 - p_1(y_1 + y_2)) \\= -(y_1 - p_1)
=−(p1y1⋅(∑)2ez1∑−(ez1)2+p2y2⋅(∑)20−ez1⋅ez2)=−(p1y1⋅(p1−p12)+p2y2⋅(−p1⋅p2))=−(y1−p1(y1+y2))=−(y1−p1)
同理:
δ
l
o
s
s
δ
z
2
=
−
(
y
2
−
p
2
)
\frac {\delta loss}{\delta z_2} = -(y_2 - p_2)
δz2δloss=−(y2−p2)
所以最终
δ
l
o
s
s
δ
z
=
(
p
−
y
)
(
这是向量形式,并且很巧合,和
s
i
g
m
o
i
d
l
o
s
s
对
x
求导的结果是一样的
)
所以最终\frac {\delta loss}{\delta z} = (p - y)\\(这是向量形式,并且很巧合,和sigmoidloss对x求导的结果是一样的)
所以最终δzδloss=(p−y)(这是向量形式,并且很巧合,和sigmoidloss对x求导的结果是一样的)
- 注:多元交叉熵损失函数是通过信息熵相关理论推导而来