该系列仅在原课程基础上部分知识点添加个人学习笔记,或相关推导补充等。如有错误,还请批评指教。在学习了 Andrew Ng 课程的基础上,为了更方便的查阅复习,将其整理成文字。因本人一直在学习英语,所以该系列以英文为主,同时也建议读者以英文为主,中文辅助,以便后期进阶时,为学习相关领域的学术论文做铺垫。- ZJ
转载请注明作者和出处:ZJ 微信公众号-「SelfImprovementLab」
知乎:https://zhuanlan.zhihu.com/c_147249273
CSDN:http://blog.csdn.net/JUNJUN_ZHAO/article/details/78897414
2.7 Computation Graph (计算图)
计算图&计算图的导数计算
(字幕来源:网易云课堂)
Probably say that the computations of a neural network,are all organized in terms of a forward path or a forward propagation step,in which we compute the output of the neural network followed by a backward pass or a back complication step,which we use to compute gradients or compute derivatives.the computation graph explains why it is organized this way,in this video we’ll go through an example,in order to illustrate the computation graph.let’s use a simpler example than logistic regression or a informal neural network.
可以说,一个神经网络的计算,都是按照前向或反向传播过程来实现的。首先计算出神经网络的输出,紧接着进行一个反向传输操作,后者我们用来计算出对应的梯度或者导数。这个流程图解释了为什么用这样的方式这样实现,在这个视频中我们将看一个例子,为了阐明这个计算过程,举一个比 logistic 回归 更加简单的、不那么正式的,神经网络的例子。
let’s say that we’re trying to compute a function J,which is a function of three variables a b and c,and let’s say that function is three times a plus B times C,computing this function actually has three distinct steps,the first is you need to compute,what is B times C,and let’s say we store that in a variable called u,so U is equal to B times C,and then you might compute v is equal a times u (a+u),so let’s say you know this is V,and then finally your output J is 3 times V,so this is your final function J you trying to compute,we can take these three steps,and draw them in a computation graph as follows,let’s say I draw your three variables a B and C here,so the first thing we did was compute u equals B times C.
我们尝试计算函数
J
,
I’m going to put a rectangular box around that,and so the inputs of that are B and C,and then you might have V equals a plus u,so the inputs to that ah so the inputs to,that are u which we just computed together with a,and then finally we have
J
equals three times v,so as I can for example a equals five B equals 3,and C equals two then u equals BC would be six,V equals a plus u be five plus six and eleven,
我在这周围放个矩形框,它的输入是
and what we’ll see in the next couple slides is that,in order to compute derivatives,Opa right to left pass like this,kind of going in the opposite direction as the blue arrows,that would be most natural for computing the derivatives,so the recap the computation graph,organizes a computation with this blue arrow left to right computation,lets defer to the next video,how you can do the backward red arrow,right to left computation of the derivatives,let’s go on to the next video.
在接下来的幻灯片中我们会看到,为了计算导数,从右到左的这个过程,和这个蓝色箭头的过程相反,这会是用于计算导数 最自然的方式,因此概括一下 流程图,是用蓝色箭头画出来的 从左到右的计算,看看下一个视频怎么做,这个反向红色箭头画的,也就是从右到左的导数计算,让我们继续下一个视频。
2.8 Derivatives with a Computation Graph
计算图的导数计算
In the last video, we worked through an example of,using a computation graph to compute the function
J
.Now, let’s take a cleaned up version,of that computation graph and show how you can use it,to figure out derivative calculations for that function
在上个视频中我们看了一个例子,使用流程图来计算函数
and, in fact, this is very analogous to,the example we had in the previous video,where we had f(a) equals 3a.,and so, we then derive,that df(a)/da which was slightly simplified,and slightly sloppy notation,you can read as df/da was equal to three.So, instead, here we have
而且这类似于,我们在上一个视频中的例子,我们有
Now, let’s look at another example.What is
dJ/da
?In other words, if we pump up the value of a,how does that affect the value of
J
?Well, let’s go through the example.variable a is equal to five.So let’s pump it up to 5.001.The net impact of that is that v which was a plus U,so that was previous 11,this we can increase to 11.001.and then we’ve already seen as abovethat
我们来看另一个例子,
One way to break this down is to say that if you change a then that would change v,and through changing v,that would change
J
.and so, the net change to the value of
要解释这个计算过程其中一种方式是,就是如果你改变了
a
那也会改变
So in fact if you plug in what we have worked up previously on d
J
/dv is equal to three and dv/da is equal to one,so the product of this, three times one.That actually gives you the correct value that
事实上 如果你代入进去 我们之前算过, dJdv 等于 3, dvda 等于 1,所以这个乘积 3×1,实际上就给出了正确答案, dJda 就等于 3,这张小图表示了 如何计算, dJdv 就是这个对这个变量的导数,它可以帮你计算 dJda ,所以这是另一步反向传播计算。
I just want to introduce one more new notational convention,which is that when you’re writing codes to implement backpropagation,there usually be some final output variable that you really care about,a final output variable that you really care about or that you want to optimize.and in this case, this final output variable is j.It’s really the last note in your computation graph.and so, a lot of computations will be trying to compute the derivative of that find the output variable.So d of this final output variable with respect to some other variable.Let me just call that, d var.
现在我想介绍一个新的符号约定,当你编程实现反向传播时,通常会有一个最终输出值是你要关心的,最终的输出变量,你真正想要关心或者说优化的,在这种情况下 最终的输出变量是
J
,就是流程图里最后一个符号,所以有很多计算尝试,计算输出变量的导数,所以
So, a lot of the computations you have would be to compute the derivative of the final output variable,letter
J
in this case,with various intermediate variable such as a, b, c, u, v.and when you implement this in software,what do you call this variable name?One thing you could do is, in Python,you could write a very long variable name,d Final Output var over a d var.But that’s a very long variable name.We could call this
所以在很多计算中你需要,计算最终输出结果的导数,在这个例子里是
J
,还有各种中间变量 比如a b c u v,当你在软件里实现的时候,变量名叫什么?,你可以做的一件事是 在 Python 中,你可以写一个很长的变量名,比如d FinalOutputvar 除以 d var,但这个变量名有点长,我们就用
I’m going to introduce a new notation, where in code,when you’re computing this thing in the code you write,we’re just going to use the variable name dvarin order to represent that quantity.Okay? So dvar in the code you write,will represent the derivative of the final output variable you care about such as j,sometimes the last L with respect to the various intermediate quantities you’re computing in your code.So this thing here in your code,you use dv to denote this value.So dv would be equal to three and your code represents this as a da,which is we also figured out to be equal to three.Okay? So we’ve done backpropagation partiallythrough this computation graph.
我这里要介绍一个新符号 在程序里,当你编程的时候 在代码里,我们就使用变量名
dvar
,来表示那个量,好 所以在程序里是
dvar
,表示导数,你关心的最终变量
J
的导数,有时最后是
let’s go through the rest of this example on the next slide.So let’s go to clean up a copy of the computation graph.and just to recap,what we’ve done so far, is go backward here and figured out that dv is equal to three.and again, the definition of dv,that’s just a variable name of the code is really d, j, d, v.I figured out that da is equal to three and again,da is the variable name in your code and that’s really the value of dJ, da. Have a sort of hand wave,how you have gone backwards on these two edges, like so.Now, let’s keep computing derivatives.Let’s look at the value, u.So what is dJ, du?
我们在下一张幻灯片看看这个例子剩下的部分。我们清理出一张新的流程图。我们回顾一下,到目前为止 我们一直在往回传播,并算出
dv
等于 3。再次
dv
的定义是,就是一个变量名 在代码里是 dJ dv。我发现
da=3
再次,da是代码里的变量名,其实代表
dJ/da
的值。大概手算了一下,两条线怎么计算反向传播。好 我们继续计算导数。我们看看这个值
u
。那么
Well, through a similar calculation as what we did before,now we start off with u equals six.If you bump up u to 6.001,then v which is previous 11,goes up to 11.001,and so j goes from 33 to 33.003.and so the increase in j is 3x, so this is equal.and the analysis for u is very similar to the analysis we did for a.This is actually computed as dJ, dv times dv, du.With this, we had already figured out was three,and this turns out to be equal to one.So we’ve got one more step of back propagation,we end up computing that du is also equal to three,and du is of course, just as dJ, du.
好 通过和之前类似的计算,现在我们从
u=6
出发。如果你令
u
增加到 6.001,那么
Now, we just step through one last example in detail.So what is dJ, dv?Imagine if you are allowed to change the value of b and you want to tweak b a little bit in order to minimize or maximize the value of j.So what is the derivative, what’s the slope of this function j when you change the value of b a little bit?It turns out that,using the chain rule for calculus,this can be written as the product of two things,is dJ, du times du, dv.and the reasoning is,if you change b a little bit, so b goes to 3 to, say, 3.001.The way it’ll affect j is,it will first affect u.So how much does it affect u?
现在我们仔细看看最后一个例子,
dJdv
呢?想像一下 如果你改变了
b
的值 你想要,然后变化一点 让
Well, u is defined as b times c, right?So this will go from six when b is equal to three,to now, or 6.002.Right? Because c is equal to two, in our example here.and so this tells us that, du, db is equal to two,because when you pump up b by .001,u increase twice as much.So du, db, this is equal to two.and now, we know that u has gone up twice as much as b has gone up.Well, what is dJ, du?We’ve already figured out thatthis is equal to three and so by multiplying these two parts,we find that dJ,db is equal to six.
好
and again, here’s the reasoning for the second part of the argument, which is,we want to know when u goes up by .002,how does that affect j?The fact that dJ, du is equal to three,that tells us that when u goes up by .002,j goes up three times as much.So j should go up by .006, right?That comes from a fact that dJ, du is equal to three.and if you check the math in detail,you will find that,if b becomes 3.001,then u becomes 6.002,v becomes 11.002, so that’s a plus u, that’s five plus u.and then j, which is equal to three times v,that answer being equal to 33.006.Right? and so that’s how you get that dJ, db is equal to six.and to fill that in, this is if we go backwards,so this is db is equal to six and db really is the Python code variable name for the dJ, db.and I won’t go through the last example in great detail but it turns out that,if you also compute how dJ, da,this turns out to be dJ, du times du, da and this turns out to be nine.Just turns out to be three times three.I won’t go through that example in detail.Through this last step,it is possible to derive that d_c is equal to 9.
好这就是推导第二部分的推导 其中,我们想知道
u
增加 0.002,会对J有什么影响,实际上
So the key takeaway from this video,from this example is that,when computing derivatives in computing all of these derivatives,the most efficient way to do so,is through a right to left computation following the direction of the red arrows.and in particular, we’ll first compute the derivatives respect to v,and then that becomes useful for computing the derivative respect a,and the derivative respect to u.and then, derivative respect to u, for example,this term over here and this term over here, those, in turn,become useful for computing the derivative respect to b,and the derivative respect to c. So that wasa computation graph and how there’s a forward or left to right calculation to compute the cost functions such as j,do you might want to optimize.and a backwards or a right to left calculation to compute derivatives.
所以这个视频的要点是,对于那个例子,当计算所有这些导数时,最有效率的办法是,从右到左计算,跟着这个红色箭头走,特别是 当我们第一次计算对
v
的导数时,之后在计算对
If you’re not familiar with calculus or the chain rule,I know some of those details are gone by really quickly.But if you didn’t follow all the details, don’t worry about it.In the next video, we’ll go over this again,in the context of logistic regression,and show you exactly what you need to do,in order to implement the computations you need,to compute derivatives through the logistic regression model.
如果你不熟悉微积分或链式法则,我知道这里有些细节讲的很快,但如果你没跟上所有细节也不用怕,在下一个视频中 我会再过一遍,在 logistic 回归的背景下过一遍,并给你们介绍需要做什么,才能编写代码,实现 logistic 回归模型中的导数计算。
PS: 欢迎扫码关注公众号:「SelfImprovementLab」!专注「深度学习」,「机器学习」,「人工智能」。以及 「早起」,「阅读」,「运动」,「英语 」「其他」不定期建群 打卡互助活动。