3. 前向/反向传播——Softmax

参考资料

cs231n Course Materials: Backprop
Derivatives, Backpropagation, and Vectorization
cs231n Lecture 4:Neural Networks and Backpropagation
cs231n Assignment 2
笔记: Batch Normalization及其反向传播

3. SoftMax 损失函数

"""
Inputs:
    - X: Input data, of shape (N, C) where x[i, j] is the score for the jth
      class for the ith input.
    - Y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
      0 <= y[i] < C

Returns a tuple of:
	- L: Scalar giving the loss
	- dx: Gradient of the loss with respect to x
"""

L i = − log ⁡ e X i , y i ∑ j e X i , j (3.1) L_i=-\log{\frac{e^{X_{i,y_i}}}{\sum_{j}e^{X_{i,j}}}}\tag{3.1} Li=logjeXi,jeXi,yi(3.1)
L = 1 N ∑ i L i (3.2) L=\frac{1}{N}\sum_i{L_i}\tag{3.2} L=N1iLi(3.2)
为了防止数值溢出,一般在实现时进行如下变形:
L i = − log ⁡ e X i , y i ∑ j e X i , j = − log ⁡ e max ⁡ { e i , ⋅ X } e ( X i , y i − max ⁡ { e i , ⋅ X } ) e max ⁡ { e i , ⋅ X } ∑ j e ( X i , j − max ⁡ { e i , ⋅ X } ) = − log ⁡ e ( X i , y i − max ⁡ { e i , ⋅ X } ) ∑ j e ( X i , j − max ⁡ { e i , ⋅ X } ) (3.3) \begin{aligned}L_i&=-\log{\frac{e^{X_{i,y_i}}}{\sum_{j}e^{X_{i,j}}}}\\&=-\log{\frac{e^{\max{\{e^X_{i,\cdot}\}}}e^{\left(X_{i,y_i}-\max{\{e^X_{i,\cdot}\}}\right)}}{e^{\max{\{e^X_{i,\cdot}\}}}\sum_{j}e^{\left(X_{i,j}-\max{\{e^X_{i,\cdot}\}}\right)}}}\\&=-\log{\frac{e^{\left(X_{i,y_i}-\max{\{e^X_{i,\cdot}\}}\right)}}{\sum_{j}e^{\left(X_{i,j}-\max{\{e^X_{i,\cdot}\}}\right)}}}\end{aligned}\tag{3.3} Li=logjeXi,jeXi,yi=logemax{ei,X}je(Xi,jmax{ei,X})emax{ei,X}e(Xi,yimax{ei,X})=logje(Xi,jmax{ei,X})e(Xi,yimax{ei,X})(3.3)

关于反向传播,推导如下:
由于式(3.3)的变形并不影响函数的值,所以可以使用变形前的形式进行推导。
分成两种情况进行讨论:
(1) 对 X i , y i X_{i,y_{i}} Xi,yi求梯度
∂ L ∂ X i , y i = 1 N ∂ L i ∂ X i , y i = − 1 N ∑ j e X i , j e X i , y i e X i , y i ∑ j e X i , j − ( e X i , y i ) 2 ( ∑ j e X i , j ) 2 = 1 N e X i , y i − ∑ j e X i , j ∑ j e X i , j = 1 N ( e X i , y i ∑ j e X i , j − 1 ) (3.4) \begin{aligned}\frac{\partial{L}}{\partial{X_{i,y_i}}}&=\frac{1}{N}\frac{\partial{L_i}}{\partial{X_{i,y_i}}}\\&=-\frac{1}{N}\frac{\sum_j{e^{X_{i,j}}}}{e^{X_{i,y_i}}}\frac{e^{X_{i,y_i}}\sum_{j}e^{X_{i,j}}-\left(e^{X_{i,y_i}}\right)^2}{\left(\sum_{j}e^{X_{i,j}}\right)^2}\\&=\frac{1}{N}\frac{e^{X_{i,y_i}}-\sum_j{e^{X_{i,j}}}}{\sum_j{e^{X_{i,j}}}}\\&=\frac{1}{N}\left(\frac{e^{X_{i,y_i}}}{\sum_{j}e^{X_{i,j}}}-1\right)\end{aligned}\tag{3.4} Xi,yiL=N1Xi,yiLi=N1eXi,yijeXi,j(jeXi,j)2eXi,yijeXi,j(eXi,yi)2=N1jeXi,jeXi,yijeXi,j=N1(jeXi,jeXi,yi1)(3.4)

(2) 对 X i , k ( k ≠ y i ) X_{i,k}(k\neq y_{i}) Xi,k(k=yi)求梯度
∂ L ∂ X i , k = 1 N ∂ L i ∂ X i , k = 1 N ∑ j e X i , j e X i , y i e X i , y i e X i , k ( ∑ j e X i , j ) 2 = 1 N e X i , k ∑ j e X i , j (3.5) \begin{aligned}\frac{\partial{L}}{\partial{X_{i,k}}}&=\frac{1}{N}\frac{\partial{L_i}}{\partial{X_{i,k}}}\\&=\frac{1}{N}\frac{\sum_j{e^{X_{i,j}}}}{e^{X_{i,y_i}}}\frac{e^{X_{i,y_i}}e^{X_{i,k}}}{\left(\sum_{j}e^{X_{i,j}}\right)^2}\\&=\frac{1}{N}\frac{e^{X_{i,k}}}{\sum_{j}e^{X_{i,j}}}\end{aligned}\tag{3.5} Xi,kL=N1Xi,kLi=N1eXi,yijeXi,j(jeXi,j)2eXi,yieXi,k=N1jeXi,jeXi,k(3.5)


p i , k = e X i , k ∑ j e X i , j (3.6) p_{i,k}=\frac{e^{X_{i,k}}}{\sum_{j}e^{X_{i,j}}}\tag{3.6} pi,k=jeXi,jeXi,k(3.6)
则有
∂ L ∂ X i , y i = 1 N ( p i , y i − 1 ) (3.7) \frac{\partial{L}}{\partial{X_{i,y_i}}}=\frac{1}{N}\left(p_{i,y_i}-1\right)\tag{3.7} Xi,yiL=N1(pi,yi1)(3.7)
∂ L ∂ X i , k = 1 N p i , k (3.8) \frac{\partial{L}}{\partial{X_{i,k}}}=\frac{1}{N}p_{i,k}\tag{3.8} Xi,kL=N1pi,k(3.8)

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
反向传播是通过计算损失函数对于每个参数的偏导数,从输出层向输入层逐层更新参数的过程。对于Softmax函数的反向传播,可以利用交叉熵误差来计算梯度。引用中提到的Softmax-with-Loss层的计算图可以帮助我们理解反向传播的过程。 在反向传播的过程中,首先计算softmax函数的梯度。引用中给出了softmax函数梯度的计算方法,可以根据输出值进行计算。 然后,利用交叉熵误差层的梯度和softmax函数的梯度,可以计算出Softmax-with-Loss层的梯度。 总结起来,softmax函数的反向传播可以通过计算交叉熵误差和softmax函数的梯度来获得。具体的计算过程可以参考引用和引用中提供的公式。 <span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* [误差反向传播法(Affine/Softmax层的实现)](https://blog.csdn.net/weixin_43912621/article/details/127416934)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v92^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] - *2* *3* [pytorch深度学习基础(五)——SoftMax函数反向传递公式推导及代码实现](https://blog.csdn.net/DuLNode/article/details/123878060)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v92^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值