Basics of Neural Network Programming - Derivatives with a Computation Graph

王彩旗 edwardwangcq.com

于 2021-06-20 12:09:20 发布

阅读量76

点赞数

分类专栏：人工智能 # Neural Networks and Deep Learning

本文链接：https://blog.csdn.net/edward_wang1/article/details/118067381

版权

人工智能同时被 2 个专栏收录

142 篇文章 0 订阅

订阅专栏

Neural Networks and Deep Learning

32 篇文章 0 订阅

订阅专栏

In last class, we worked through an example of using computation graph to compute the function J. In this class, let's show how you can use it to figure out derivative calculation for that function J.

Following is the computation graph:

We have:

$\frac{dJ}{dv}=3$

If $v$ nudged to 11.001 from 11, then $v$ increased to 33.003 from 33.

Then, what is $\frac{dJ}{da}$ ?

Suppose $a=5.001$ , then $v=a+u=5.001+6=11.001$ , then $J=3v=33.003$

So, the change of variable $a$ propagate to the right of the computation graph. The increase of J is 3x of increase of a. So we have:

$\frac{dJ}{da}=3$

Actually, in calculus we have following chain rule:

$\frac{dJ}{da}=\frac{dJ}{dv}\times\frac{dv}{da}=3\times1=3$

So, this little illustration shows how by having computed $\frac{dJ}{dv}$ , it can then help you to compute $\frac{dJ}{da}$ . This is another step of backward propagation.

Then what is $\frac{dJ}{du}$ ?

$\frac{dJ}{du}=\frac{dJ}{dv}\times\frac{dv}{du}=3\times1=3$

If $u$ bumps up from 6 to 6.001, then v goes up to 11.001 from 11. So $J$ goes to 33.003 from 33.

What is $\frac{dJ}{db}$ ?

$\frac{dJ}{db}=\frac{dJ}{dv}\times\frac{dv}{du}\times\frac{du}{db}=3\times1\times c=3\times2=6$

If b bumps up from 3 to 3.001, then $u$ goes to 6.002 and $v$ goes to 11.002. And J goes to 33.006.

Last, what is $\frac{dJ}{dc}$ ?

$\frac{dJ}{dc}=\frac{dJ}{du}\times\frac{du}{dc}=3\times b=9$

One notational convention when writing codes to implement backward propagation: we'll use $dVar$ to represent the derivative of your final output (such as J in this case) respect to the various intermediate quantities (a,b,c,u,v etc.).

There usually be some final output variable that we really care about or want to optimize. A lot of computations will be trying to compute the derivatives of that final output variable (J in this case) with various intermediate variables such as a,b,c,u,v, noted as $\frac{dFinalOutputVar}{dVar}$ . Then, what do you call this complex variable name in the codes? Because we always take derivatives respect to this final output variable, we're going to just use the $dVar$ in order to represent that quantity.

The take away:

The most efficient way to compute derivatives is through a right-to-left backward propagation following the direction of the red arrows.

<end>

王彩旗 edwardwangcq.com

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Basics of Neural Network Programming - Derivatives with a Computation Graph

In last class, we worked through an example of using computation graph to compute the function J.
复制链接

扫一扫

专栏目录