Softmax Derivation

讨论最简单情况:

     以神经网络为例:

     假设在softmax层,输入的数据是N维的一维数组,输出结果是多分类的各个概率,假设为C类。

--1. input: x --> import data with dimension N, can be writen as $$(x_0, x_1, x_2, ..., x_{N-1})$$,  in neural network, means the last hidden layer output.

               W, b --> the affine weight with shape:(N, C) and (C, )

                 y --> the target label of the data, the value of y is in (0, C-1). I will transform y to be an one-hot vector. In the form $$(y_0, y_1, ..., y_k, ..., y_{C-1}),\ where\ y_k=1,\ y_i=0,\ i\ not\ j$$.

--2: Derivation

      定义隐层输出为$$S_i, where\ i\ in\ (0, C-1), S_i= \frac{e^{z_i}}{\sum_{j}{e^{z_j}}},\ where\ z_i = W_{., i}^T x + b_i$$.

     定义损失函数:

                                                         $$Loss = \sum_{i=0}^{i=C-1}{y_i log(S_i)}$$

     求反向传播参数dx, dW:

     这里我把它变成求中间变量dz, 然后用dz 推导dx, dW.

                                                    $$\frac{\partial Loss}{\partial z_i}=\sum{\frac{\partial Loss}{\partial S_j} \frac{\partial S_j}{\partial z_i}}$$

                                                     $$\frac{\partial Loss}{\partial S_j} = y_j \times \frac{1}{S_j}$$

                                                      $$\frac{\partial S_j}{\partial z_i} = -S_i S_j,\ if\ i\ not\ equal\ to\ j. $$

                                                     $$\frac{\partial S_j}{\partial z_i} = (1-S_i)S_i,\ if\ i\ equal\ to\ j. $$

So:

                                                      $$\frac{\partial Loss}{\partial z_i} = \sum_{j=0}^{j=C-1} \frac{y_j}{S_j} \times \delta_{j} $$

where:

                                                     $$\delta_{j} = (1-S_i)S_i,\ if\ j=i, else\ (0 - S_i) S_j$$

                                                    $$\frac{\partial Loss}{\partial z_i} = y_i (1-S_i) + \sum_{j\ is\ not\ i} y_j( -S_i)$$

改写成:

                                                   $$\delta_j = (y_j - S_j)S_i, if\ and\ only\ if\ i=j, y_j = 1$$, one-hot 性质决定了结果中实际只有一个值有效。因此这个形式可以化简成其中$$y_i$$为1的那项。

                                                   $$\frac{\partial Loss}{\partial z_i} = \sum_i y_i (y_i - S_i)=y_k-S_k, \ where\ k\ is\ the\ target\ label$$

同理可以求出其他z,然后根据z求出dx, dW.

                欢迎讨论!

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值