Softmax Derivation

最新推荐文章于 2023-04-09 19:49:46 发布

AugustMoore

最新推荐文章于 2023-04-09 19:49:46 发布

阅读量279

点赞数 1

分类专栏： machine learning Mathmatic 文章标签： Mathmatic deep learning neural network

本文链接：https://blog.csdn.net/AugustMoore/article/details/84671177

版权

Mathmatic 同时被 2 个专栏收录

7 篇文章 1 订阅

订阅专栏

machine learning

5 篇文章 0 订阅

订阅专栏

讨论最简单情况：

以神经网络为例：

假设在softmax层，输入的数据是N维的一维数组，输出结果是多分类的各个概率，假设为C类。

--1. input: x --> import data with dimension N, can be writen as $(x_0, x_1, x_2, ..., x_{N-1})$ , in neural network, means the last hidden layer output.

W, b --> the affine weight with shape:(N, C) and (C, )

y --> the target label of the data, the value of y is in (0, C-1). I will transform y to be an one-hot vector. In the form $(y_0, y_1, ..., y_k, ..., y_{C-1}),\ where\ y_k=1,\ y_i=0,\ i\ not\ j$ .

--2: Derivation

定义隐层输出为 $S_i, where\ i\ in\ (0, C-1), S_i= \frac{e^{z_i}}{\sum_{j}{e^{z_j}}},\ where\ z_i = W_{., i}^T x + b_i$ .

定义损失函数：

$Loss = \sum_{i=0}^{i=C-1}{y_i log(S_i)}$

求反向传播参数dx, dW:

这里我把它变成求中间变量dz, 然后用dz 推导dx, dW.

$\frac{\partial Loss}{\partial z_i}=\sum{\frac{\partial Loss}{\partial S_j} \frac{\partial S_j}{\partial z_i}}$

$\frac{\partial Loss}{\partial S_j} = y_j \times \frac{1}{S_j}$

$\frac{\partial S_j}{\partial z_i} = -S_i S_j,\ if\ i\ not\ equal\ to\ j.$

$\frac{\partial S_j}{\partial z_i} = (1-S_i)S_i,\ if\ i\ equal\ to\ j.$

So:

$\frac{\partial Loss}{\partial z_i} = \sum_{j=0}^{j=C-1} \frac{y_j}{S_j} \times \delta_{j}$

where:

$\delta_{j} = (1-S_i)S_i,\ if\ j=i, else\ (0 - S_i) S_j$

$\frac{\partial Loss}{\partial z_i} = y_i (1-S_i) + \sum_{j\ is\ not\ i} y_j( -S_i)$

改写成：

$\delta_j = (y_j - S_j)S_i, if\ and\ only\ if\ i=j, y_j = 1$ , one-hot 性质决定了结果中实际只有一个值有效。因此这个形式可以化简成其中 $y_i$ 为1的那项。

$\frac{\partial Loss}{\partial z_i} = \sum_i y_i (y_i - S_i)=y_k-S_k, \ where\ k\ is\ the\ target\ label$

同理可以求出其他z,然后根据z求出dx, dW.

欢迎讨论！

AugustMoore

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Softmax Derivation

讨论最简单情况：以神经网络为例：假设在softmax层，输入的数据是N维的一维数组，输出结果是多分类的各个概率，假设为C类。--1. input: x --&gt; import data with dimension N, can be writen as , in neural network, means the last hidden layer outp...
复制链接

扫一扫

专栏目录