交叉熵损失(Cross Entropy)求导

Cross Entropy是分类问题中常见的一种损失函数,我们在之前的文章提到过二值交叉熵的证明交叉熵的作用,下面解释一下交叉熵损失的求导。
首先一个模型的最后一层神经元的输出记为f0...fif_{0}...f_{i}
输出经过softmax激活之后记为p0...pip_{0}...p_{i},那么:
pi=efik=0C1efkp_{i} = \frac{e^{f_{i}}}{\sum_{k=0}^{C-1} e^{f_{k}}}
类别的实际标签记为y0...yiy_{0}...y_{i},那么交叉熵损失L为:
L=i=0C1yilogpiL = -\sum_{i=0}^{C-1} y_{i}log^{p_{i}}
上式中的loglog是一种简写,为了后续的求导方便,一般我们认为loglog的底是ee,即logloglnln
那么LL对第ii个神经元的输出fif_{i}求偏导Lfi\frac{\partial L}{\partial f_{i}}:
根据复合函数求导原则:
Lfi=j=0C1Ljpjpjfi\frac{\partial L}{\partial f_{i}} = \sum_{j=0}^{C-1} \frac{\partial L_{j}}{\partial p_{j}}\frac{\partial p_{j}}{\partial f_{i}}
在这里需要说明,在softmax中我们使用了下标iikk,在交叉熵中使用了下标ii,但是这里的两个ii并不等价,因为softmax的分母中包含了每个神经元的输出ff,也就是激活后所有的pp对任意的fif_{i}求偏导都不为0,同时LL中又包含了所有的pp,所以为了避免重复我们需要为pp引入一个新的下标jjjj0...C10...C-1这C种情况。
那么依次求导:

Ljpj=(yjlogpj)(pj)\frac{\partial L_{j}}{\partial p_{j}}= \frac{\partial (-y_{j}log^{p_{j}})}{\partial (p_{j})}

由于默认一般我们认为loglog的底是ee,即logloglnln,所以:

Ljpj=(yjlogpj)(pj)=yjpj\frac{\partial L_{j}}{\partial p_{j}}= \frac{\partial (-y_{j}log^{p_{j}})}{\partial (p_{j})} =-\frac{y_{j}}{p_{j}}

接着要求pjfi\frac{\partial p_{j}}{\partial f_{i}}的值,在这里可以发现,每一个pjp_{j}中都包含fif_{i},所以pjfi\frac{\partial p_{j}}{\partial f_{i}}都不是0,但是j=ij=ijij \neq i的时候,pjfi\frac{\partial p_{j}}{\partial f_{i}}结果又不相同,所以这里需要分开讨论:

  • 首先j=ij=i时:
    pjfi=pifi=efik=0C1efkfi\frac{\partial p_{j}}{\partial f_{i}} = \frac{\partial p_{i}}{\partial f_{i}} = \frac{\partial \frac{e^{f_{i}}}{\sum_{k=0}^{C-1} e^{f_{k}}}}{\partial f_{i}}
    =(efi)k=0C1efkefi(k=0C1efk)(k=0C1efk)2= \frac{ (e^{f_{i}})' \sum_{k=0}^{C-1} e^{f_{k}} - e^{f_{i}}(\sum_{k=0}^{C-1} e^{f_{k}})' }{(\sum_{k=0}^{C-1} e^{f_{k}})^{2}}
    =efik=0C1efk(efi)2(k=0C1efk)2=efik=0C1efk(efik=0C1efk)2= \frac{ e^{f_{i}}\sum_{k=0}^{C-1} e^{f_{k}} - (e^{f_{i}})^2 }{(\sum_{k=0}^{C-1} e^{f_{k}})^{2}}= \frac{ e^{f_{i}} }{\sum_{k=0}^{C-1} e^{f_{k}}} - (\frac{ e^{f_{i}} }{\sum_{k=0}^{C-1} e^{f_{k}}})^2
    =pi(pi)2=pi(1pi) = p_{i}-(p{i})^2 = p_{i}(1-p_{i})

  • 然后jij\neq i时:
    pjfi=efjk=0C1efkfi\frac{\partial p_{j}}{\partial f_{i}}= \frac{\partial \frac{e^{f_{j}}}{\sum_{k=0}^{C-1} e^{f_{k}}}}{\partial f_{i}}
    =(efj)k=0C1efkefj(k=0C1efk)(k=0C1efk)2= \frac{ (e^{f_{j}})' \sum_{k=0}^{C-1} e^{f_{k}} - e^{f_{j}}(\sum_{k=0}^{C-1} e^{f_{k}})' }{(\sum_{k=0}^{C-1} e^{f_{k}})^{2}}
    =efiefj(k=0C1efk)2=efik=0C1efkefjk=0C1efk= \frac{ - e^{f_{i}} e^{f_{j}} }{(\sum_{k=0}^{C-1} e^{f_{k}})^{2}} = - \frac{ e^{f_{i}} }{\sum_{k=0}^{C-1} e^{f_{k}}} \frac{ e^{f_{j}} }{\sum_{k=0}^{C-1} e^{f_{k}}}
    =pipj = -p_{i}p_{j}

对于最后的偏导数,需要把上述两个部分加起来:
Lfi=j=iC1Ljpjpjfi+jiC1Ljpjpjfi\frac{\partial L}{\partial f_{i}} = \sum_{j=i}^{C-1} \frac{\partial L_{j}}{\partial p_{j}}\frac{\partial p_{j}}{\partial f_{i}} + \sum_{j\neq i}^{C-1} \frac{\partial L_{j}}{\partial p_{j}}\frac{\partial p_{j}}{\partial f_{i}}
=yipipi(1pi)+jiC1pipj(yjpj)=-\frac{y_{i}}{p_{i}}p_{i}(1-p_{i}) + \sum_{j\neq i}^{C-1}-p_{i}p_{j}(-\frac{y_{j}}{p_{j}})
=yi(1pi)+jiC1piyj=-y_{i}(1-p_{i}) + \sum_{j\neq i}^{C-1}p_{i}y_{j}
=yipiyi+jiC1piyj=y_{i}p_{i}-y_{i} + \sum_{j\neq i}^{C-1}p_{i}y_{j}

在上式中,jij\neq i的情况中刚好缺了j=ij=i,所以可以继续改写为:
=j=0C1piyjyi=\sum_{j=0}^{C-1}p_{i}y_{j} - y_{i}
=pij=0C1yjyi=p_{i}\sum_{j=0}^{C-1}y_{j} - y_{i}
j=0C1yj=1\sum_{j=0}^{C-1}y_{j} = 1,所以:
=pij=0C1yjyi=piyi=p_{i}\sum_{j=0}^{C-1}y_{j} - y_{i} = p_{i}-y_{i}

©️2020 CSDN 皮肤主题: Age of Ai 设计师: meimeiellie 返回首页
实付0元
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值