Example of Chain Rule in Gradient Back-propagation in LR

Example of Gradient Back-propagation in LR

In this example, we would not clearly separate optimizer, cost function and the model as in modern machine learning framework. We just derive the gradient back-propagation function for the whole process.

For SGD Case

这里写图片描述
Forward propagation:

  • logitraw=Σwixi+b
  • h=sigmoid(logitraw)
  • Loss=ytrue×log(h)(1ytrue)×log(1h),ytrue{0,1}

Back propagation:

  • Lh=ytrue1h(1ytrue)1h1=(ytrueh)h(1h)
  • hlogitraw=h(1h)
  • logitrawwi=xi
  • Lwi=Lhhlogitrawlogitrawwi=xi(ytrueh)

So the gradient to wi is derived as above.

For Batch-GD Case

In batch-GD situation, the data flow is just like combined sgd ones as in the 3d-picture(sorry 3d fails. take this 2d one).
这里写图片描述
By the chain rule of multivariate differential[1], the gradient to wi is the summation of the gradient from each back-propagation flow.
Gradient L_TOTALwi=Σxi(ytrueh)


Reference:
[1] Chain Rule - Wikipedia
[2] CS231n Back Propagation

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值