关于MSELoss、BCELoss、CELoss损失函数求导的推导

1、MSELoss求导
  • y:   ~  真实值
  • p:   ~  预测值
    M S E l o s s = 1 2 ⋅ ∑ i = 1 n ( y i − p i ) 2 p i = x i ⋅ w + b MSE_{loss} = \frac12\cdot\sum_{i=1}^n(y_i - p_i)^2 \\ p_i = x_i\cdot w + b MSEloss=21i=1n(yipi)2pi=xiw+b
    δ l o s s δ w = 2 ⋅ 1 2 ⋅ ∑ i = 1 n ( y i − p i ) ⋅ ( − 1 ) ⋅ x i = ∑ i = 1 n ( p i − y i ) ⋅ x i \frac{\delta loss}{\delta w} = 2 \cdot \frac12 \cdot \sum_{i=1}^n(y_i - p_i) \cdot(-1)\cdot x_i = \sum_{i=1}^n(p_i - y_i)\cdot x_i δwδloss=221i=1n(yipi)(1)xi=i=1n(piyi)xi
    δ l o s s δ b = 2 ⋅ 1 2 ⋅ ∑ i = 1 n ( y i − p i ) ⋅ ( − 1 ) = ∑ i = 1 n ( p i − y i ) \frac{\delta loss}{\delta b} = 2 \cdot \frac12 \cdot \sum_{i=1}^n(y_i - p_i)\cdot(-1) = \sum_{i=1}^n(p_i - y_i) δbδloss=221i=1n(yipi)(1)=i=1n(piyi)
  • 注:线性回归损失函数的其中一种由来是对误差进行正态分布概率建模推导而来
2、BCELoss求导

B E C l o s s = − ∑ i = 1 n   [ y i ⋅ l o g ( p i ) + ( 1 − y i ) ⋅ l o g ( 1 − p i ) ] p i = s i g m o i d ( x i ) = 1 1 + e − x i δ p i δ x i = p i ⋅ ( 1 − p i ) δ l o s s δ x i = δ l o s s δ p i ⋅ δ p i δ x i = − ∑ i = 1 n   ( y i ⋅ 1 p i + ( 1 − p i ) ⋅ 1 p i − 1 ) ⋅ p i ⋅ ( 1 − p i ) = ∑ i = 1 n   ( p i − y i ) 所以: δ l o s s δ x i = ∑ i = 1 n   ( p i − y i ) BEC_{loss} = -\sum_{i=1}^n ~[y_i\cdot log(p_i) + (1 - y_i) \cdot log(1 -p_i)]\\ p_i = sigmoid(x_i)=\frac{1}{1 + e^{-x_i}}\\ \frac {\delta p_i}{\delta x_i} = p_i \cdot (1 - p_i)\\ \frac {\delta loss}{\delta x_i} = \frac {\delta loss}{\delta p_i} \cdot \frac {\delta p_i}{\delta x_i}=-\sum_{i=1}^n ~ (y_i\cdot \frac {1}{p_i} + (1 - p_i)\cdot \frac{1}{p_i-1})\cdot p_i \cdot(1 -p_i)=\sum_{i=1}^n~ (p_i - y_i)\\ 所以:\frac {\delta loss}{\delta x_i} = \sum_{i=1}^n~ (p_i - y_i)\\ BECloss=i=1n [yilog(pi)+(1yi)log(1pi)]pi=sigmoid(xi)=1+exi1δxiδpi=pi(1pi)δxiδloss=δpiδlossδxiδpi=i=1n (yipi1+(1pi)pi11)pi(1pi)=i=1n (piyi)所以:δxiδloss=i=1n (piyi)

  • 注:二元交叉熵损失函数是通过伯努利0、1分布概率建模推导而来
3、CELoss求导

C E l o s s = − ∑ i n   y i ⋅ l o g p i p i = e z i ∑ j = 1 k e z j = s o f t m a x ( z i ) CE_{loss} = -\sum_i^n~y_i\cdot logp_i\\ p_i = \frac {e^{z_i}}{\sum_{j=1}^ke^{z_j}} = softmax(z_i) CEloss=in yilogpipi=j=1kezjezi=softmax(zi)
考虑以两个神经元输出为例: l o s s = − ( y 1 ⋅ l o g p 1 + y 2 ⋅ l o g p 2 ) loss = -(y_1 \cdot logp_1 + y_2 \cdot logp_2) loss=(y1logp1+y2logp2)
y = ( y 1 , y 2 ) = ( 0 , 1 ) ,    p = ( p 1 , p 2 ) = ( e z 1 e z 1 + e z 2 ,    e z 2 e z 1 + e z 2 ) y = (y_1, y_2) = (0, 1), ~~p = (p_1, p_2) = (\frac {e^{z_1}}{e^{z_1} + e^{z_2}},~~\frac {e^{z_2}}{e^{z_1} + e^{z_2}}) y=(y1,y2)=(0,1),  p=(p1,p2)=(ez1+ez2ez1,  ez1+ez2ez2)
δ l o s s δ z 1 = δ l o s s δ p 1 ⋅ δ p 1 δ z 1 + δ l o s s δ p 2 ⋅ δ p 2 δ z 1 \frac {\delta loss}{\delta z_1} =\frac {\delta loss}{\delta p_1} \cdot \frac {\delta p_1}{\delta z_1} + \frac {\delta loss}{\delta p_2} \cdot \frac {\delta p_2}{\delta z_1} δz1δloss=δp1δlossδz1δp1+δp2δlossδz1δp2
= − ( y 1 p 1 ⋅ e z 1 ∑ − ( e z 1 ) 2 ( ∑ ) 2 + y 2 p 2 ⋅ 0 − e z 1 ⋅ e z 2 ( ∑ ) 2 ) = − ( y 1 p 1 ⋅ ( p 1 − p 1 2 ) + y 2 p 2 ⋅ ( − p 1 ⋅ p 2 ) ) = − ( y 1 − p 1 ( y 1 + y 2 ) ) = − ( y 1 − p 1 ) = -(\frac{y_1}{p_1}\cdot \frac{e^{z_1}\sum - (e^{z_1})^2}{(\sum)^2} + \frac{y_2}{p_2}\cdot \frac{0 - e^{z_1}\cdot e^{z_2}}{(\sum)^2}) \\= -(\frac {y_1}{p_1} \cdot (p_1 - p_1^2) + \frac {y_2}{p_2} \cdot (-p_1 \cdot p_2)) \\= -(y_1 - p_1(y_1 + y_2)) \\= -(y_1 - p_1) =(p1y1()2ez1(ez1)2+p2y2()20ez1ez2)=(p1y1(p1p12)+p2y2(p1p2))=(y1p1(y1+y2))=(y1p1)
同理: δ l o s s δ z 2 = − ( y 2 − p 2 ) \frac {\delta loss}{\delta z_2} = -(y_2 - p_2) δz2δloss=(y2p2)
所以最终 δ l o s s δ z = ( p − y ) ( 这是向量形式,并且很巧合,和 s i g m o i d l o s s 对 x 求导的结果是一样的 ) 所以最终\frac {\delta loss}{\delta z} = (p - y)\\(这是向量形式,并且很巧合,和sigmoidloss对x求导的结果是一样的) 所以最终δzδloss=(py)(这是向量形式,并且很巧合,和sigmoidlossx求导的结果是一样的)

  • 注:多元交叉熵损失函数是通过信息熵相关理论推导而来
  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值