Coursera | Andrew Ng (01-week-2-2.7&2.8)—计算图&计算图的导数计算

该系列仅在原课程基础上部分知识点添加个人学习笔记,或相关推导补充等。如有错误,还请批评指教。在学习了 Andrew Ng 课程的基础上,为了更方便的查阅复习,将其整理成文字。因本人一直在学习英语,所以该系列以英文为主,同时也建议读者以英文为主,中文辅助,以便后期进阶时,为学习相关领域的学术论文做铺垫。- ZJ

Coursera 课程 |deeplearning.ai |网易云课堂


转载请注明作者和出处:ZJ 微信公众号-「SelfImprovementLab」

知乎https://zhuanlan.zhihu.com/c_147249273

CSDNhttp://blog.csdn.net/JUNJUN_ZHAO/article/details/78897414


2.7 Computation Graph (计算图)

计算图&计算图的导数计算
(字幕来源:网易云课堂)

Probably say that the computations of a neural network,are all organized in terms of a forward path or a forward propagation step,in which we compute the output of the neural network followed by a backward pass or a back complication step,which we use to compute gradients or compute derivatives.the computation graph explains why it is organized this way,in this video we’ll go through an example,in order to illustrate the computation graph.let’s use a simpler example than logistic regression or a informal neural network.

可以说,一个神经网络的计算,都是按照前向或反向传播过程来实现的。首先计算出神经网络的输出,紧接着进行一个反向传输操作,后者我们用来计算出对应的梯度或者导数。这个流程图解释了为什么用这样的方式这样实现,在这个视频中我们将看一个例子,为了阐明这个计算过程,举一个比 logistic 回归 更加简单的、不那么正式的,神经网络的例子。

let’s say that we’re trying to compute a function J,which is a function of three variables a b and c,and let’s say that function is three times a plus B times C,computing this function actually has three distinct steps,the first is you need to compute,what is B times C,and let’s say we store that in a variable called u,so U is equal to B times C,and then you might compute v is equal a times u (a+u),so let’s say you know this is V,and then finally your output J is 3 times V,so this is your final function J you trying to compute,we can take these three steps,and draw them in a computation graph as follows,let’s say I draw your three variables a B and C here,so the first thing we did was compute u equals B times C.

这里写图片描述

我们尝试计算函数 J J是三个变量 a b c 的函数,这个函数是 3(a+bc) ,计算这个函数 实际上有三个不同的步骤,第一个首先是,计算 b 乘以c,我们把它储存在变量 u 中,因此u=bc,然后计算 v=a+u (原说法有误),这就是 v ,最后输出J 就是 3v ,这就是要计算的函数 J ,我们可以把这三步,画成如下的流程图,我先在这画三个变量 a b c,第一步就是计算u=bc

这里写图片描述

I’m going to put a rectangular box around that,and so the inputs of that are B and C,and then you might have V equals a plus u,so the inputs to that ah so the inputs to,that are u which we just computed together with a,and then finally we have J equals three times v,so as I can for example a equals five B equals 3,and C equals two then u equals BC would be six,V equals a plus u be five plus six and eleven,J is three times v so J is equal to 33,and indeed hope you can verify that you know,this is a three times five plus three times two,and ifyou expand that out,you know you actually get thirty three is the value of J,so the computation graph comes in handy,when there is some distinguished or some special output variable,such as J in this case that you want to optimize,and in the case of the logistic regression,J is of course the cost function that we’re trying to minimize,and what we’ve seen in this little example is that,through a left-to-right pause you can compute the value of J .

我在这周围放个矩形框,它的输入是b c ,接着第二步 v=a+u,这个的输入就是,刚才计算出来的 u 还有a,最后一步 J=3v ,举个例子 a=5 b=3 c=2 u=bc 就是6, v=a+u 就是 5+6 就是 11, J 是三倍的v 因此 J 就等于 33,你们自己可以验证以下,这是35+32,如果你把它算出来,实际上得到 33 就是 J 的值,这个流程图用起来很方便,有不同的 或者一些特殊的输出变量时,比如J也是我们想要优化,在 logistic 回归中, J 是想要最小化的成本函数,可以看出 通过一个,从左向右的过程 你可以计算出J的值。

这里写图片描述

and what we’ll see in the next couple slides is that,in order to compute derivatives,Opa right to left pass like this,kind of going in the opposite direction as the blue arrows,that would be most natural for computing the derivatives,so the recap the computation graph,organizes a computation with this blue arrow left to right computation,lets defer to the next video,how you can do the backward red arrow,right to left computation of the derivatives,let’s go on to the next video.

在接下来的幻灯片中我们会看到,为了计算导数,从右到左的这个过程,和这个蓝色箭头的过程相反,这会是用于计算导数 最自然的方式,因此概括一下 流程图,是用蓝色箭头画出来的 从左到右的计算,看看下一个视频怎么做,这个反向红色箭头画的,也就是从右到左的导数计算,让我们继续下一个视频。


2.8 Derivatives with a Computation Graph

计算图的导数计算

这里写图片描述

In the last video, we worked through an example of,using a computation graph to compute the function J .Now, let’s take a cleaned up version,of that computation graph and show how you can use it,to figure out derivative calculations for that function J.So, here’s a computation graph.Let’s say you want to compute,the derivative of J with respect to v. So, what is that?Well, this says if we were to,take this value of v and change it a little bit,how would the value of J change?Well, J is defined as three times v,and right now v is equal to 11.So, if we’re to pump up v by a little bit to 11.001,then J which has three vs and currently 33 will get pumped up to 33.003.So, here we’ve increased v by .001 and the net result of that is that J goes up three times as much.So the derivative of J with respect to v is equal to three,because the increase in J is three times the increase in v.

在上个视频中我们看了一个例子,使用流程图来计算函数J,现在我们清理一下,流程图的描述 看看你如何利用它,计算出函数 J 的导数,所以这是一个流程图,假设你要计算,J v 的导数 那怎么算呢?好 比如说 我们要,把这个v值拿过来 改变一下,那么J的值会怎么变呢?,所以定义上 J 3v,现在 v 等于 11,所以如果你让v增加一点点 比如到11.001,那么J 3v ,现在 33 就增加到 33.003,所以我这里 v 增加了 0.001 然后,最终结果是J上升到原来的三倍,所以 J v的导数就等于 3,因为对于任何v的增量 J 都会有三倍增量。

这里写图片描述

and, in fact, this is very analogous to,the example we had in the previous video,where we had f(a) equals 3a.,and so, we then derive,that df(a)/da which was slightly simplified,and slightly sloppy notation,you can read as df/da was equal to three.So, instead, here we have J equals 3v,and so dJ/dv is equal to three,with here J playing the role of f,and v playing the role of a in,this previous example that we had right from an earlier video.In the terminology of backpropagation what we’ve seen is that,if you want to compute,the derivative of this final output variable,which uses variable you care most about,with respect to v,then we’re done sort of one step of backpropagation,so the called one step backwards in this graph.

而且这类似于,我们在上一个视频中的例子,我们有f(a)=3a,然后我们推导出,那个df(a)/da 就是稍微化简之后的,有点随便的写法,你可以看成 df/da=3 ,所以这里我们有 J =3v,所以dJ/dv就等于 3,这里 J 扮演了f的角色,v扮演了 a 的角色,在之前的视频里的例子,在反向传播算法中的术语 我们看到,如果你想计算,最后输出变量的导数,使用你最关心的变量,对v的导数,那么我们就做完了一步反向传播,在这个流程图中是一个反向步。

Now, let’s look at another example.What is dJ/da ?In other words, if we pump up the value of a,how does that affect the value of J ?Well, let’s go through the example.variable a is equal to five.So let’s pump it up to 5.001.The net impact of that is that v which was a plus U,so that was previous 11,this we can increase to 11.001.and then we’ve already seen as abovethat J now gets bumped up to 33.003.So, what we’ve seen is that if you increase a by 0.001, J increases by 0.003.and by increase a I mean if you were to take this value 5 and just plug in the new value,then the change to a will propagate to the right of the computation graph.So that J ends up being 33.003.and so, the increase to J is three times the increase to a.That means this derivative is equal to three.

我们来看另一个例子,dJ/da是多少呢?换句话说 如果我们提高 a 的数值,对J的数值有什么影响?好 我们看看这个例子,变量 a=5 ,我们让它增加到5.001,那么对 v 的影响就是a+u,之前是 11,现在变成 11.001,我们从上面看到,现在 J 就变成 33.003了,所以我们看到的是 如果你让 a 增加 0.001, J 增加 0.003,那么增加 a 我是说,如果你把这个 5 换成某个新值,那么 a 的改变量,就会传播到流程图的最右,所以J最后是 33.003,所以 J 的增量是 3 乘以 a 的增量,意味着这个导数是 3。

这里写图片描述

One way to break this down is to say that if you change a then that would change v,and through changing v,that would change J .and so, the net change to the value of J,when you bump up the value,when you nudge the value of a up a little bit is that,first, by changing a you end up increasing v. Well,how much does v increase?It is increased by an amount that’s determined by dv/da and then the change in v will cause the value of J to also increase.So, in Calculus this is actually called the chain rule,that’s if a affects v affects J,then the amount that J changes when you nudge a is the product of how much v changes when you nudge a,times how much J changes when you nudge v.So in Calculus again this is called the chain rule.What we saw from this calculation is that if you increase a by 0.001,v changes by the same amount.So dv/da is equal to one.

要解释这个计算过程其中一种方式是,就是如果你改变了 a 那也会改变v,通过改变v,也会改变 J ,所以J值的净变化量,当你提升这个值,当你把 a 值提高一点点 这就是J的变化量,首先 a 增加了 v也会增加, v 增加多少呢?,增加了一个量,这取决于dv/da 然后 v 的变化,导致J也在增加,所以这在微积分里实际上叫链式法则,如果 a 影响到v影响到 J ,那么当你让a变大时 J 的变化量,就是当你改变a v 的变化量乘以,改变v时 J的变化量,在微积分里这叫链式法则,我们从这个计算中看到,如果你让 a 增加 0.001,v也会变化相同的大小,所以 dv/da 就等于1。

So in fact if you plug in what we have worked up previously on d J /dv is equal to three and dv/da is equal to one,so the product of this, three times one.That actually gives you the correct value that dJ/da is equal to three.This little illustration shows how by having computed dJ/dv had this derivative with respect to this variable,it can then help you to compute dJ/da .and so, that’s another step of this backward calculation.

事实上 如果你代入进去 我们之前算过, dJdv 等于 3, dvda 等于 1,所以这个乘积 3×1,实际上就给出了正确答案, dJda 就等于 3,这张小图表示了 如何计算, dJdv 就是这个对这个变量的导数,它可以帮你计算 dJda ,所以这是另一步反向传播计算。

这里写图片描述

I just want to introduce one more new notational convention,which is that when you’re writing codes to implement backpropagation,there usually be some final output variable that you really care about,a final output variable that you really care about or that you want to optimize.and in this case, this final output variable is j.It’s really the last note in your computation graph.and so, a lot of computations will be trying to compute the derivative of that find the output variable.So d of this final output variable with respect to some other variable.Let me just call that, d var.

现在我想介绍一个新的符号约定,当你编程实现反向传播时,通常会有一个最终输出值是你要关心的,最终的输出变量,你真正想要关心或者说优化的,在这种情况下 最终的输出变量是 J ,就是流程图里最后一个符号,所以有很多计算尝试,计算输出变量的导数,所以d输出变量 对某个变量的导数,我们就用 dvar 命名。

So, a lot of the computations you have would be to compute the derivative of the final output variable,letter J in this case,with various intermediate variable such as a, b, c, u, v.and when you implement this in software,what do you call this variable name?One thing you could do is, in Python,you could write a very long variable name,d Final Output var over a d var.But that’s a very long variable name.We could call this dJ, d var.But because you’re always taking derivatives respect to dJ ,respect to this final output variable.

所以在很多计算中你需要,计算最终输出结果的导数,在这个例子里是 J ,还有各种中间变量 比如a b c u v,当你在软件里实现的时候,变量名叫什么?,你可以做的一件事是 在 Python 中,你可以写一个很长的变量名,比如d FinalOutputvar 除以 d var,但这个变量名有点长,我们就用dJ/dvar,但因为你一直对 dJ 求导,对这个最终输出变量求导。

这里写图片描述

I’m going to introduce a new notation, where in code,when you’re computing this thing in the code you write,we’re just going to use the variable name dvarin order to represent that quantity.Okay? So dvar in the code you write,will represent the derivative of the final output variable you care about such as j,sometimes the last L with respect to the various intermediate quantities you’re computing in your code.So this thing here in your code,you use dv to denote this value.So dv would be equal to three and your code represents this as a da,which is we also figured out to be equal to three.Okay? So we’ve done backpropagation partiallythrough this computation graph.

我这里要介绍一个新符号 在程序里,当你编程的时候 在代码里,我们就使用变量名 dvar ,来表示那个量,好 所以在程序里是 dvar ,表示导数,你关心的最终变量 J 的导数,有时最后是L,对代码中各种中间量的导数,所以代码里这个东西,你用 dv 表示这个值,所以 dv 就等于 3 你的代码表示就是 da ,这也等于 3,好 所以我们通过这个流程图,部分完成的后向传播算法。

let’s go through the rest of this example on the next slide.So let’s go to clean up a copy of the computation graph.and just to recap,what we’ve done so far, is go backward here and figured out that dv is equal to three.and again, the definition of dv,that’s just a variable name of the code is really d, j, d, v.I figured out that da is equal to three and again,da is the variable name in your code and that’s really the value of dJ, da. Have a sort of hand wave,how you have gone backwards on these two edges, like so.Now, let’s keep computing derivatives.Let’s look at the value, u.So what is dJ, du?

我们在下一张幻灯片看看这个例子剩下的部分。我们清理出一张新的流程图。我们回顾一下,到目前为止 我们一直在往回传播,并算出 dv 等于 3。再次 dv 的定义是,就是一个变量名 在代码里是 dJ dv。我发现 da=3 再次,da是代码里的变量名,其实代表 dJ/da 的值。大概手算了一下,两条线怎么计算反向传播。好 我们继续计算导数。我们看看这个值 u 。那么dJ/du是多少呢?

这里写图片描述

Well, through a similar calculation as what we did before,now we start off with u equals six.If you bump up u to 6.001,then v which is previous 11,goes up to 11.001,and so j goes from 33 to 33.003.and so the increase in j is 3x, so this is equal.and the analysis for u is very similar to the analysis we did for a.This is actually computed as dJ, dv times dv, du.With this, we had already figured out was three,and this turns out to be equal to one.So we’ve got one more step of back propagation,we end up computing that du is also equal to three,and du is of course, just as dJ, du.

好 通过和之前类似的计算,现在我们从 u=6 出发。如果你令 u 增加到 6.001,那么v之前是 11,现在变成 11.001 了, J 就从 33 变成 33.003。所以J增量是3倍 所以这就等于。你对 u 的分析很类似对a的分析。实际上这计算起来就是 dJdvdvdu 。有了这个 我们就可以算那个结果是 3,这个结果是 1。所以我们还有一步反向传播,我们最终计算出 du 也等于 3,这 du 当然了 就是 dJ/du

Now, we just step through one last example in detail.So what is dJ, dv?Imagine if you are allowed to change the value of b and you want to tweak b a little bit in order to minimize or maximize the value of j.So what is the derivative, what’s the slope of this function j when you change the value of b a little bit?It turns out that,using the chain rule for calculus,this can be written as the product of two things,is dJ, du times du, dv.and the reasoning is,if you change b a little bit, so b goes to 3 to, say, 3.001.The way it’ll affect j is,it will first affect u.So how much does it affect u?

现在我们仔细看看最后一个例子, dJdv 呢?想像一下 如果你改变了 b 的值 你想要,然后变化一点 让J值达到最大,那么导数是什么呢? 这个 J 函数的斜率,当你稍微改变b值之后,事实上,使用微积分的链式法则,这可以写成两者的乘积,就是 dJdududv ,理由是,如果你改变 b 一点点 所以b变成比如说 3.001,它影响 J 的方式是,首先会影响u,它对 u 的影响有多大?

Well, u is defined as b times c, right?So this will go from six when b is equal to three,to now, or 6.002.Right? Because c is equal to two, in our example here.and so this tells us that, du, db is equal to two,because when you pump up b by .001,u increase twice as much.So du, db, this is equal to two.and now, we know that u has gone up twice as much as b has gone up.Well, what is dJ, du?We’ve already figured out thatthis is equal to three and so by multiplying these two parts,we find that dJ,db is equal to six.

u的定义是 bc ,所以 b=3 时这是 6,现在就变成 6.002 了,对吧 因为在我们的例子中 c 等于 2,所以这告诉我们dudb等于 2,当你让 b 增加 0.001时,u就增加两倍,所以 dudb 这等于 2,现在我想 u 已经增加量是b的两倍,那么 dJdu 是多少?,我们已经弄清楚了,这等于 3 所以让这两部分相乘,我们发现 dJdb 等于 6。

这里写图片描述

and again, here’s the reasoning for the second part of the argument, which is,we want to know when u goes up by .002,how does that affect j?The fact that dJ, du is equal to three,that tells us that when u goes up by .002,j goes up three times as much.So j should go up by .006, right?That comes from a fact that dJ, du is equal to three.and if you check the math in detail,you will find that,if b becomes 3.001,then u becomes 6.002,v becomes 11.002, so that’s a plus u, that’s five plus u.and then j, which is equal to three times v,that answer being equal to 33.006.Right? and so that’s how you get that dJ, db is equal to six.and to fill that in, this is if we go backwards,so this is db is equal to six and db really is the Python code variable name for the dJ, db.and I won’t go through the last example in great detail but it turns out that,if you also compute how dJ, da,this turns out to be dJ, du times du, da and this turns out to be nine.Just turns out to be three times three.I won’t go through that example in detail.Through this last step,it is possible to derive that d_c is equal to 9.

好这就是推导第二部分的推导 其中,我们想知道 u 增加 0.002,会对J有什么影响,实际上dJdu等于 3,这告诉我们 u 增加 0.002 之后,j上升了3 倍,那么 j 应该上升 0.006 对吧,这可以从dJdu=3推导出来,如果你仔细看看这些数学内容,你会发现,如果 b 变成 3.001,那么u就变成 6.002, v 变成 11.002 所以这是a+u 这是 5+u ,然后 j 就等于3v,所以答案就是 33.006,对吧? 这就是如何得到 dJdb=6 ,为了填进去 如果我们反向走的话,这是 db 等于 6 而 db 其实是,Python 代码中的变量名 表示 dJdb ,我不会很详细的介绍最后一个例子,但事实上,如果你同时算算 dJda ,结果这是 dJdu ,乘以 duda 这结果是 9,是 3×3,我不会详细说明这个例子,在最后一步,我们可以推出 dc 等于 9。

这里写图片描述

So the key takeaway from this video,from this example is that,when computing derivatives in computing all of these derivatives,the most efficient way to do so,is through a right to left computation following the direction of the red arrows.and in particular, we’ll first compute the derivatives respect to v,and then that becomes useful for computing the derivative respect a,and the derivative respect to u.and then, derivative respect to u, for example,this term over here and this term over here, those, in turn,become useful for computing the derivative respect to b,and the derivative respect to c. So that wasa computation graph and how there’s a forward or left to right calculation to compute the cost functions such as j,do you might want to optimize.and a backwards or a right to left calculation to compute derivatives.

所以这个视频的要点是,对于那个例子,当计算所有这些导数时,最有效率的办法是,从右到左计算,跟着这个红色箭头走,特别是 当我们第一次计算对 v 的导数时,之后在计算对a导数就可以用到,对 u 的导数,然后对u的导数 比如说,这个项和这里这个项,可以帮助计算对 b 的导数,然后对c的导数 所以这是,一个计算流程图 就是正向,或者说从左到右的计算 来计算成本函数 j <script type="math/tex" id="MathJax-Element-3585">j</script>,你可能需要优化的函数,然后反向从右到左计算导数。

If you’re not familiar with calculus or the chain rule,I know some of those details are gone by really quickly.But if you didn’t follow all the details, don’t worry about it.In the next video, we’ll go over this again,in the context of logistic regression,and show you exactly what you need to do,in order to implement the computations you need,to compute derivatives through the logistic regression model.

如果你不熟悉微积分或链式法则,我知道这里有些细节讲的很快,但如果你没跟上所有细节也不用怕,在下一个视频中 我会再过一遍,在 logistic 回归的背景下过一遍,并给你们介绍需要做什么,才能编写代码,实现 logistic 回归模型中的导数计算。


PS: 欢迎扫码关注公众号:「SelfImprovementLab」!专注「深度学习」,「机器学习」,「人工智能」。以及 「早起」,「阅读」,「运动」,「英语 」「其他」不定期建群 打卡互助活动。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值