Basics of Neural Network Programming - Derivatives with a Computation Graph


In last class, we worked through an example of using computation graph to compute the function J. In this class, let's show how you can use it to figure out derivative calculation for that function J.

Following is the computation graph:

Figure-1
  • We have:

\frac{dJ}{dv}=3

If v nudged to 11.001 from 11, then v increased to 33.003 from 33.

  • Then, what is \frac{dJ}{da}?

Suppose a=5.001, then v=a+u=5.001+6=11.001, then J=3v=33.003

So, the change of variable a propagate to the right of the computation graph. The increase of J is 3x of increase of a. So we have:

\frac{dJ}{da}=3

Actually, in calculus we have following chain rule:

\frac{dJ}{da}=\frac{dJ}{dv}\times\frac{dv}{da}=3\times1=3

So, this little illustration shows how by having computed \frac{dJ}{dv} , it can then help you to compute \frac{dJ}{da} . This is another step of backward propagation.

  • Then what is \frac{dJ}{du}?

\frac{dJ}{du}=\frac{dJ}{dv}\times\frac{dv}{du}=3\times1=3

If u bumps up from 6 to 6.001, then v goes up to 11.001 from 11. So J goes to 33.003 from 33.

  • What is \frac{dJ}{db}?

\frac{dJ}{db}=\frac{dJ}{dv}\times\frac{dv}{du}\times\frac{du}{db}=3\times1\times c=3\times2=6

If b bumps up from 3 to 3.001, then u goes to 6.002 and v goes to 11.002. And J goes to 33.006.

  • Last, what is \frac{dJ}{dc}?

\frac{dJ}{dc}=\frac{dJ}{du}\times\frac{du}{dc}=3\times b=9

  •  One notational convention when writing codes to implement backward propagation: we'll use dVar to represent the derivative of your final output (such as J in this case) respect to the various intermediate quantities (a,b,c,u,v etc.).

There usually be some final output variable that we really care about or want to optimize. A lot of computations will be trying to compute the derivatives of that final output variable (J in this case) with various intermediate variables such as a,b,c,u,v, noted as \frac{dFinalOutputVar}{dVar}.  Then, what do you call this complex variable name in the codes? Because we always take derivatives respect to this final output variable, we're going to just use the dVar in order to represent that quantity.

The take away:

The most efficient way to compute derivatives is through a right-to-left backward propagation following the direction of the red arrows.

<end>

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值