logsumexp 反向传播推导

本文详细介绍了LogSumExp函数及其在多分类问题中Softmax的使用,重点展示了如何计算LogSumExp对输入变量的导数,涉及符号表示和数学推导过程。
摘要由CSDN通过智能技术生成

最近学了下陈天奇大佬的DeepLearningSystem课程,HW2里面有一块是对LogSumExp(简称LSE)算子求导数。
LSE应用非常广泛(例如多分类里的Softmax可以利用LSE来解决上溢问题 )。
所以这篇文章对LSE做了一个求导(但写的有点繁琐
顺便练练LaTeX 😄


下面是一些符号的说明:
i n p u t : z ∈ R n a r g m a x ( z ) = j , max ⁡ z = z j z i ^ = z i − max ⁡ z = z i − z j L o g S u m E x p ( z i ) = log ⁡ ( ∑ k = 1 n exp ⁡ ( z i − max ⁡ z ) ) + max ⁡ z = log ⁡ ( ∑ k = 1 n exp ⁡ ( z i ^ ) ) + z j L S E = L o g S u m E x p input: z \in \mathbb{R}^n \\ argmax \left(z \right) = j, \max{z}=z_j\\ \hat{z_{i}} = z_{i} - \max{z}=z_i-z_j\\ LogSumExp(z_i) = \log(\sum_{k=1}^{n}\exp(z_{i}-\max{z}))+\max{z}=\log(\sum_{k=1}^{n}\exp(\hat{z_i}))+z_j \\ LSE=LogSumExp input:zRnargmax(z)=j,maxz=zjzi^=zimaxz=zizjLogSumExp(zi)=log(k=1nexp(zimaxz))+maxz=log(k=1nexp(zi^))+zjLSE=LogSumExp


  1. i ≠ j i\neq j i=j
    ∂ L S E ∂ z i = ∂ L S E ∂ log ⁡ ∑ k = 1 n exp ⁡ ( z k ^ ) ⋅ ∂ log ⁡ ∑ k = 1 n exp ⁡ ( z k ^ ) ∂ z i + ∂ L S E ∂ max ⁡ z ⋅ ∂ max ⁡ z ∂ z i = 1 ⋅ ∂ log ⁡ ∑ k = 1 n exp ⁡ ( z k ^ ) ∂ ∑ k = 1 n exp ⁡ ( z k ^ ) ⋅ ∂ ∑ k = 1 n exp ⁡ ( z k ^ ) ∂ z i ^ + 1 ⋅ 0 = ∂ log ⁡ ∑ k = 1 n exp ⁡ ( z k ^ ) ∂ ∑ k = 1 n exp ⁡ ( z k ^ ) ⋅ ∑ k = 1 n ( ∂ exp ⁡ ( z k ^ ) ∂ z i ^ ) = 1 ∑ k = 1 n exp ⁡ ( z k ^ ) ⋅ ∑ k = 1 n ( ∂ exp ⁡ ( z k ^ ) ∂ z k ^ ⋅ ∂ z k ^ ∂ z i ) = 1 ∑ k = 1 n exp ⁡ ( z k ^ ) ⋅ ∑ k = 1 n ( exp ⁡ ( z k ^ ) ⋅ ∂ ( z k − max ⁡ z ) ∂ z i ) = 1 ∑ k = 1 n exp ⁡ ( z k ^ ) ⋅ ∑ k = 1 n ( exp ⁡ ( z k ^ ) ⋅ I ( k = i ) ) = 1 ∑ exp ⁡ ( z k ^ ) ⋅ exp ⁡ ( z i ^ ) = exp ⁡ ( z i ^ ) ∑ k = 1 n exp ⁡ ( z k ^ ) \begin{align} \frac{\partial{LSE}}{\partial{z_{i}}} &= \frac{\partial{LSE}}{\partial{\log\sum_{k=1}^{n}\exp(\hat{z_{k}})}} \cdot \frac{\partial{\log\sum_{k=1}^{n}\exp(\hat{z_{k}})}}{\partial{z_{i}}} + \frac{\partial{LSE}}{\partial{\max{z}}} \cdot \frac{\partial{\max{z}}}{\partial{z_{i}}} \\ &= 1 \cdot \frac{\partial{\log\sum_{k=1}^{n}\exp(\hat{z_{k}})}}{\partial{\sum_{k=1}^{n}{\exp(\hat{z_{k}})}}} \cdot \frac{\partial{{\sum_{k=1}^{n}\exp(\hat{z_{k}})}}}{\partial{\hat{z_{i}}}} + 1 \cdot 0 \\ &= \frac{\partial{\log\sum_{k=1}^{n}\exp(\hat{z_{k}})}}{\partial{\sum_{k=1}^{n}{\exp(\hat{z_{k}})}}} \cdot \sum_{k=1}^{n}\left(\frac{\partial{{\exp(\hat{z_{k}})}}}{\partial{\hat{z_{i}}}}\right) \\ &= \frac{1}{{\sum_{k=1}^{n}\exp(\hat{z_{k}})}} \cdot \sum_{k=1}^{n}\left(\frac{\partial{{\exp(\hat{z_{k}})}}}{\partial{\hat{z_{k}}}} \cdot \frac{{\partial{{\hat{z_{k}}}}}}{\partial{z_{i}}} \right) \\ &= \frac{1}{{\sum_{k=1}^{n}\exp(\hat{z_{k}})}} \cdot \sum_{k=1}^{n}\left(\exp(\hat{z_k}) \cdot \frac{{\partial{({z_{k}-\max{z}})}}}{\partial{z_{i}}} \right) \\ &= \frac{1}{{\sum_{k=1}^{n}\exp(\hat{z_{k}})}} \cdot \sum_{k=1}^{n}\left(\exp(\hat{z_{k}}) \cdot \mathbb{I}\left(k=i\right) \right) \\ &= \frac{1}{{\sum\exp(\hat{z_{k}})}} \cdot \exp(\hat{z_{i}}) \\ &= \frac{\exp(\hat{z_{i}})}{\sum_{k=1}^{n} {\exp(\hat{z_{k}})}} \nonumber \end{align} ziLSE=logk=1nexp(zk^)LSEzilogk=1nexp(zk^)+maxzLSEzimaxz=1k=1nexp(zk^)logk=1nexp(zk^)zi^k=1nexp(zk^)+10=k=1nexp(zk^)logk=1nexp(zk^)k=1n(zi^exp(zk^))=k=1nexp(zk^)1k=1n(zk^exp(zk^)zizk^)=k=1nexp(zk^)1k=1n(exp(zk^)zi(zkmaxz))=k=1nexp(zk^)1k=1n(exp(zk^)I(k=i))=exp(zk^)1exp(zi^)=k=1nexp(zk^)exp(zi^)
  2. i = j i=j i=j时,即 z i = z j = max ⁡ z z_i=z_j=\max{z} zi=zj=maxz
    ∂ L S E ∂ z i = ∂ L S E ∂ log ⁡ ∑ k = 1 n exp ⁡ ( z k ^ ) ⋅ ∂ log ⁡ ∑ k = 1 n exp ⁡ ( z k ^ ) ∂ z i + ∂ L S E ∂ max ⁡ z ⋅ ∂ max ⁡ z ∂ z i = 1 ⋅ ∂ log ⁡ ∑ k = 1 n exp ⁡ ( z k ^ ) ∂ ∑ k = 1 n exp ⁡ ( z k ^ ) ⋅ ∂ ∑ k = 1 n exp ⁡ ( z k ^ ) ∂ z i ^ + 1 ⋅ 1 = 1 ∑ k = 1 n exp ⁡ ( z k ^ ) ⋅ ∑ k = 1 n ( ∂ exp ⁡ ( z k ^ ) ∂ z i ^ ) + 1 = 1 ∑ k = 1 n exp ⁡ ( z k ^ ) ⋅ ∑ k = 1 n ( exp ⁡ ( z k ^ ) ⋅ ∂ ( z k − max ⁡ z ) ∂ z i ) + 1 = 1 ∑ k = 1 n exp ⁡ ( z k ^ ) ⋅ ( exp ⁡ ( z k ^ ) ⋅ ∂ ( z i − max ⁡ z ) ∂ z i + ∑ k = 1 , k ≠ i n ( exp ⁡ ( z k ^ ) ⋅ ∂ ( z k − max ⁡ z ) ∂ z i ) ) + 1 = 1 ∑ k = 1 n exp ⁡ ( z k ^ ) ⋅ ( exp ⁡ ( z k ^ ) ⋅ ∂ ( z i − z i ) ∂ z i + ∑ k = 1 , k ≠ i n ( exp ⁡ ( z k ^ ) ⋅ ∂ ( z k − z i ) ∂ z i ) ) + 1 = 1 ∑ k = 1 n exp ⁡ ( z k ^ ) ⋅ ( exp ⁡ ( z k ^ ) ⋅ 0 + ∑ k = 1 , k ≠ i n ( exp ⁡ ( z k ^ ) ⋅ − 1 ) ) + 1 = 1 ∑ k = 1 n exp ⁡ ( z k ^ ) ⋅ ( − ∑ k = 1 , k ≠ i n exp ⁡ ( z k ^ ) ) + 1 = − ∑ k = 1 , k ≠ i n exp ⁡ ( z k ^ ) ∑ k = 1 n exp ⁡ ( z k ^ ) + ∑ k = 1 n exp ⁡ ( z k ^ ) ∑ k = 1 n exp ⁡ ( z k ^ ) = exp ⁡ ( z i ^ ) ∑ k = 1 n exp ⁡ ( z k ^ ) \begin{align} \frac{\partial{LSE}}{\partial{z_{i}}} &= \frac{\partial{LSE}}{\partial{\log\sum_{k=1}^{n}\exp(\hat{z_{k}})}} \cdot \frac{\partial{\log\sum_{k=1}^{n}\exp(\hat{z_{k}})}}{\partial{z_{i}}} + \frac{\partial{LSE}}{\partial{\max{z}}} \cdot \frac{\partial{\max{z}}}{\partial{z_{i}}} \\ &= 1 \cdot \frac{\partial{\log\sum_{k=1}^{n}\exp(\hat{z_{k}})}}{\partial{\sum_{k=1}^{n}{\exp(\hat{z_{k}})}}} \cdot \frac{\partial{{\sum_{k=1}^{n}\exp(\hat{z_{k}})}}}{\partial{\hat{z_{i}}}} + 1 \cdot 1 \\ &= \frac{1}{\sum_{k=1}^{n}{\exp(\hat{z_{k}})}} \cdot \sum_{k=1}^{n}\left(\frac{\partial{{\exp(\hat{z_{k}})}}}{\partial{\hat{z_{i}}}}\right) + 1 \\ &= \frac{1}{{\sum_{k=1}^{n}\exp(\hat{z_{k}})}} \cdot \sum_{k=1}^{n}\left(\exp(\hat{z_{k}}) \cdot \frac{{\partial{(z_k-\max{z})}}}{\partial{z_{i}}} \right) + 1 \\ &= \frac{1}{{\sum_{k=1}^{n}\exp(\hat{z_{k}})}} \cdot \left( \exp(\hat{z_k})\cdot \frac{\partial{(z_i-\max{z})}}{\partial{z_i}} + \sum_{k=1,k \neq i}^{n}\left(\exp(\hat{z_{k}}) \cdot \frac{{\partial{(z_k-\max{z})}}}{\partial{z_{i}}} \right)\right) + 1 \\ &= \frac{1}{{\sum_{k=1}^{n}\exp(\hat{z_{k}})}} \cdot \left( \exp(\hat{z_k})\cdot \frac{\partial{(z_i-z_i)}}{\partial{z_i}} + \sum_{k=1,k \neq i}^{n}\left(\exp(\hat{z_{k}}) \cdot \frac{{\partial{(z_k-z_i)}}}{\partial{z_{i}}} \right)\right) + 1 \\ &= \frac{1}{{\sum_{k=1}^{n}\exp(\hat{z_{k}})}} \cdot \left( \exp(\hat{z_k})\cdot 0 + \sum_{k=1,k \neq i}^{n}\left(\exp\left(\hat{z_{k}}\right) \cdot -1 \right)\right) + 1 \\ &= \frac{1}{{\sum_{k=1}^{n}\exp(\hat{z_{k}})}} \cdot \left( -\sum_{k=1,k \neq i}^{n}\exp\left(\hat{z_{k}} \right)\right) + 1 \\ &= \frac{-\sum_{k=1,k \neq i}^{n}\exp\left(\hat{z_{k}} \right)}{{\sum_{k=1}^{n}\exp(\hat{z_{k}})}} + \frac{\sum_{k=1}^{n}\exp(\hat{z_{k}})}{{\sum_{k=1}^{n}\exp(\hat{z_{k}})}}\\ &= \frac{\exp(\hat{z_{i}})}{{\sum_{k=1}^{n}\exp(\hat{z_{k}})}} \end{align} ziLSE=logk=1nexp(zk^)LSEzilogk=1nexp(zk^)+maxzLSEzimaxz=1k=1nexp(zk^)logk=1nexp(zk^)zi^k=1nexp(zk^)+11=k=1nexp(zk^)1k=1n(zi^exp(zk^))+1=k=1nexp(zk^)1k=1n(exp(zk^)zi(zkmaxz))+1=k=1nexp(zk^)1 exp(zk^)zi(zimaxz)+k=1,k=in(exp(zk^)zi(zkmaxz)) +1=k=1nexp(zk^)1 exp(zk^)zi(zizi)+k=1,k=in(exp(zk^)zi(zkzi)) +1=k=1nexp(zk^)1 exp(zk^)0+k=1,k=in(exp(zk^)1) +1=k=1nexp(zk^)1 k=1,k=inexp(zk^) +1=k=1nexp(zk^)k=1,k=inexp(zk^)+k=1nexp(zk^)k=1nexp(zk^)=k=1nexp(zk^)exp(zi^)

可知两种情况下对LSE求导结果都等于 exp ⁡ ( z i ^ ) ∑ k = 1 n exp ⁡ ( z k ^ ) \frac{\exp(\hat{z_{i}})}{{\sum_{k=1}^{n}\exp(\hat{z_{k}})}} k=1nexp(zk^)exp(zi^)

P.S. 封面图片源: What Is a Gradient in Machine Learning?

参考:

  1. 关于LogSumExp
  • 23
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
卷积神经网络(Convolutional Neural Network,CNN)是一种常用于图像处理和计算机视觉任务的深度学习模型。卷积层是CNN中的核心组件之一,它通过卷积操作提取输入数据的特征。 在卷积层的反向传播推导中,我们需要计算损失函数对于卷积层输入、权重和偏置的梯度。下面是卷积层反向传播推导的步骤: 1. 计算损失函数对于卷积层输出的梯度: 首先,根据损失函数的定义,计算损失函数对于卷积层输出的梯度。这个梯度可以通过使用链式法则从后一层传播过来的梯度计算得到。 2. 计算损失函数对于卷积层权重的梯度: 使用卷积操作的性质,将损失函数对于卷积层输出的梯度与输入数据进行卷积操作,得到损失函数对于卷积层权重的梯度。 3. 计算损失函数对于卷积层输入的梯度: 使用卷积操作的性质,将损失函数对于卷积层输出的梯度与卷积层权重进行卷积操作,得到损失函数对于卷积层输入的梯度。 4. 计算损失函数对于卷积层偏置的梯度: 将损失函数对于卷积层输出的梯度按照通道求和,得到每个通道的梯度,即损失函数对于卷积层偏置的梯度。 以上是卷积层反向传播推导的基本步骤。在实际应用中,还需要考虑批量处理、激活函数等因素。如果你对某个具体的CNN模型或者反向传播的细节有更具体的问题,可以告诉我。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值