softmax loss层的求导反向传播

  深度学习中的分类网络,一般都是使用softmax和交叉熵作为损失函数。关于softmax和cross entropy的介绍和解释可以详见我的另一篇博客softmax loss。这篇博客仅解释如何对softmax loss层进行求导反向传播。

  假设网络最后一层的输出为 z \mathbf{z} z,经过softmax后输出为 p \mathbf{p} p,真实标签为 y \mathbf{y} y(one hot编码),则损失函数为:

L = − ∑ i = 1 C y i log ⁡ p i L = - \sum_{i=1}^{C} y_i \log p_i L=i=1Cyilogpi

其中 C C C表示共有 C C C个类。

  对softmax loss层求导,即求 ∂ L ∂ z \frac{\partial L}{\partial \mathbf{z}} zL,可以通过求 ∂ L ∂ z j \frac{\partial L}{\partial z_j} zjL进行说明。

∂ L ∂ z j = − ∑ i = 1 C y i ∂ log ⁡ p i ∂ z j = − ∑ i = 1 C y i p i ∂ p i ∂ z j \begin{aligned} \frac {\partial L}{\partial z_j} &= - \sum_{i=1}^{C} y_i \frac{\partial \log p_i}{\partial z_j} \\ &= - \sum_{i=1}^{C} \frac{y_i}{p_i} \frac{\partial p_i}{\partial z_j} \end{aligned} zjL=i=1Cyizjlogpi=i=1Cpiyizjpi

  因为 p \mathbf{p} p z \mathbf{z} z经过softmax函数后的输出,即 p = s o f t m a x ( z ) \mathbf{p} = softmax(\mathbf{z}) p=softmax(z)

p i = e z i ∑ k = 1 C e z k p_i = \frac {e^{z_i}}{\sum_{k=1}^{C} e^{z_k}} pi=k=1Cezkezi

∂ p i ∂ z j \frac{\partial p_i}{\partial z_j} zjpi的求解分为两种情况,即$i = j 和 和 i \neq j$,分别进行推导,如下:

i = j 时 : ∂ p i ∂ z j = ∂ p j ∂ z j = ∂ e z j ∑ k = 1 C e z k ∂ z j = e z j ∑ k = 1 C e z k + e z j × ( − 1 ) × ( 1 ∑ k = 1 C e z k ) 2 × e z j = p j − p j 2 = p j ( 1 − p j ) \begin{aligned} i = j 时:\\ \frac{\partial p_i}{\partial z_j} &= \frac{\partial p_j}{\partial z_j} \\ &= \frac{\partial \frac {e^{z_j}}{\sum_{k=1}^{C} e^{z_k}}}{\partial z_j} \\ &= \frac {e^{z_j}}{\sum_{k=1}^{C} e^{z_k}} + e^{z_j} \times (-1) \times {(\frac{1}{\sum_{k=1}^{C} e^{z_k}})}^2 \times e^{z_j}\\ &= p_j - p_j^2 \\ &= p_j(1-p_j) \end{aligned} i=jzjpi=zjpj=zjk=1Cezkezj=k=1Cezkezj+ezj×(1)×(k=1Cezk1)2×ezj=pjpj2=pj(1pj)

i ≠ j 时 : ∂ p i ∂ z j = e z i × ( − 1 ) × ( 1 ∑ k = 1 C e z k ) 2 × e z j = − p i p j \begin{aligned} i \neq j 时:\\ \frac{\partial p_i}{\partial z_j} &= e^{z_i} \times (-1) \times {(\frac{1}{\sum_{k=1}^{C} e^{z_k}})}^2 \times e^{z_j}\\ &= -p_ip_j \end{aligned} i=jzjpi=ezi×(1)×(k=1Cezk1)2×ezj=pipj

故有,
∂ L ∂ z j = − y j p j p j ( 1 − p j ) − ∑ i ≠ j y i p i ( − p i p j ) = − y i + p j ∑ i = 1 C y i = p j − y j \begin{aligned} \frac{\partial L}{\partial z_j} &= -\frac{y_j}{p_j}p_j(1-p_j) - \sum_{i\neq j} \frac{y_i}{p_i}(-p_ip_j) \\ &= -y_i + p_j \sum_{i=1}^{C} y_i \\ &= p_j - y_j \end{aligned} zjL=pjyjpj(1pj)i=jpiyi(pipj)=yi+pji=1Cyi=pjyj

表示成向量形式有
∂ L ∂ z = p − y \frac{\partial L}{\partial \mathbf{z}} = \mathbf{ p -y} zL=py

©️2020 CSDN 皮肤主题: 技术黑板 设计师:CSDN官方博客 返回首页