Example of Chain Rule in Gradient Back-propagation in LR

最新推荐文章于 2022-04-10 18:51:16 发布

travischan

最新推荐文章于 2022-04-10 18:51:16 发布

阅读量521

点赞数

文章标签： bp算法

本文链接：https://blog.csdn.net/travischan/article/details/63828003

版权

Example of Gradient Back-propagation in LR

In this example, we would not clearly separate optimizer, cost function and the model as in modern machine learning framework. We just derive the gradient back-propagation function for the whole process.

For SGD Case

这里写图片描述
Forward propagation:

$logitraw = \Sigma{w_{i}x_{i}} + b$
$h = sigmoid(logitraw)$
$Loss = -y_{true}\times log(h) - (1-y_{true})\times log(1-h), y_{true} \in \{0, 1\}$

Back propagation:

$\frac{\partial L}{\partial h} = -y_{true}\frac{1}{h} - (1- y_{true})\frac{1}{h-1} = \frac{-(y_{true}-h)}{h(1-h)}$
$\frac{\partial h}{\partial logitraw} = h(1-h)$
$\frac{\partial logitraw}{\partial w_i} = x_i$
$\frac{\partial L}{\partial w_i}=\frac{\partial L}{\partial h} \frac{\partial h}{\partial logitraw} \frac{\partial logitraw}{\partial w_i} = -x_i(y_{true}-h)$

So the gradient to $w_i$ is derived as above.

For Batch-GD Case

In batch-GD situation, the data flow is just like combined sgd ones as in the 3d-picture(sorry 3d fails. take this 2d one).
这里写图片描述
By the chain rule of multivariate differential[1], the gradient to $w_i$ is the summation of the gradient from each back-propagation flow.
Gradient $\frac{\partial L\_TOTAL}{\partial w_i} = -\Sigma {x_i(y_{true}-h)}$