Coursera | Andrew Ng (01-week-3-3.3)—计算神经网络的输出

该系列仅在原课程基础上部分知识点添加个人学习笔记,或相关推导补充等。如有错误,还请批评指教。在学习了 Andrew Ng 课程的基础上,为了更方便的查阅复习,将其整理成文字。因本人一直在学习英语,所以该系列以英文为主,同时也建议读者以英文为主,中文辅助,以便后期进阶时,为学习相关领域的学术论文做铺垫。- ZJ

Coursera 课程 |deeplearning.ai |网易云课堂


转载请注明作者和出处:ZJ 微信公众号-「SelfImprovementLab」

知乎https://zhuanlan.zhihu.com/c_147249273

CSDNhttp://blog.csdn.net/junjun_zhao/article/details/78977757


3.3 Computing a Neural Networks’s Output (计算神经网络的输出)

(字幕来源:网易云课堂)

这里写图片描述

In the last video you saw what a single hidden layer neural network looks like,in this video let’s go through the details of exactly how this neural network computers outputs,what you see is that is like Logistic L o g i s t i c regression,but repeat of all the times.let’s take a look so this is what’s a two layer neural network.let’s go more deeply into exactly what this neural network compute,now we’ve said before that Logistic L o g i s t i c regression,the circle images the regression really represents two steps of computation,first you compute z as follows,and in second you compute the activation as a sigmoid function of z,so a neural network just does this a lot more times.let’s start by focusing on just one of the nodes in the hidden layer,and this look at the first node in the hidden layer.

上一期的视频中我们已经见过,单隐层神经网络长什么样,在这期的视频中让我们了解,神经网络的输出究竟是如何计算出来的,你所看到的是像 Logistic L o g i s t i c 那样的运算过程,但整个运算过程会重复很多遍,看一下 这是一个两层的神经网络,让我们更深入地了解神经网络到底在计算什么,我们之前说过 Logistic L o g i s t i c 回归,这里的圆圈代表了回归计算的两个步骤,首先你按步骤计算出 z,然后在第二步计算激活函数,就是函数 sigmoid(z) s i g m o i d ( z ) 所以神经网络只不过重复计算这些步骤很多次,我们先来看,隐层的其中一个节点,看看这个隐层的第一个节点。

so I’ve grayed out 隐去 the other nodes for now,so similar to Logistic L o g i s t i c regression on the left,this node in a hidden layer does two steps of computation,right the first step we think it’s as the left half of this node,it computes Z equals W transpose X plus B,and the notation we’ll use is um,these are all quantities associated with the first hidden layer,so that’s why we have a bunch of square brackets there,and this is the first node in the hidden layer,so that’s why we have the subscript one over there,so first it does that,and then a second step is it computes a11 equals sigmoid of z11 like so,so for both z and a the notational convention is that,a Li the L here in superscript square brackets,refers to layer number,and the i subscript here refers to the nodes in that layer.

我暂时先隐去其他的节点,左边看上去和 Logistic L o g i s t i c 回归很相似,隐层的这个节点进行两步计算,第一步 我们可以看成是节点的左边,计算 z=wTx+b z = w T x + b ,这些我们会用到的标记,这些都是和第一隐层有关的量,所以才用了那么多 [1] 上标,这是隐层的第一个节点,所以我们有个下标 1,第一步就是这样,然后第二步是计算, a[1]1=sigmoid(Z[1]1) a 1 [ 1 ] = s i g m o i d ( Z 1 [ 1 ] ) 就像这样,所以对于 z z a 按符号约定写成,写成 a[l]i a i [ l ] ,这里上标方括号表示层数,而下标 i 则表示层中的第几个节点。

这里写图片描述

so the node we will be looking at is layer 1 that is a hidden layer node 1 ,so that’s why the superscript and subscript were on both 1 1,so that little circle that first node in neural network,represents carrying out these two steps of computation,now let’s look at the second node in neural network,the second node in the hidden layer of in neural network,similar to the Logistic L o g i s t i c regression unit on the left,this little circle represents two steps of computation,the first step is compute z,this is still layer 1 but now the second node,equals W transpose x plus b^[1]_2,and then a 12 equals Sigma z12,and again feel free to pause the video if you want,but you can double check that,the superscript and subscript notation,is consistent with what we have written here above in purple,so we’ve talk through the first two hidden units in the neural network on hidden units three and four,also represents some computations.

我们看的是第一隐层的第一个节点,所以有上标 和下标都是 1 1,所以这个小圆圈 即神经网络的第一个节点,表示执行这两步计算,现在让我们看看神经网络的第二个节点,即神经网络中隐层的第二个节点,与左边的 Logistic L o g i s t i c 回归单元类似,这个小圆圈代表了计算的两个步骤,第一步是计算 z,这还是在第一层 但变成第二个节点了,等于 wTx+b[1]2 w T x + b 2 [ 1 ] ,然后 a[1]2=sigmoid(Z[1]2) a 2 [ 1 ] = s i g m o i d ( Z 2 [ 1 ] ) ,再次 需要的话 可以暂停视频仔细看看,这样你就可以再次查看,标记的上标和下标,和我们上面所写的是保持一致的,所以我们已经讨论了神经网络的前两个隐层单元,第三四个隐藏单元,也表示同样的计算。

这里写图片描述

so now let me take this pair of equations,and this pair of equations,and let’s copy them to the next slide,so here’s our network,and here’s the first and there’s a second equations,they were worked on previously for,the first and the second hidden units,if you then go through and write out the corresponding equations,for the third and fourth hidden units,you get the following,and let’s make sure this notation is clear,this is the vector W^[1]_1 this is a vector transpose times x ok,so that’s what the superscript T there,represents this is a vector transpose,now as you might have guessed,if you’re actually implementing in neural network doing this,with a for loop seems really inefficient,so what we’re going to do is,take these four equations and vectorize.

现在让我们把这对等式,还有这对等式,把它们复制到下一个幻灯片中,这是我们的神经网络,这是第一个等式 这是第二个等式,它们之前已经在隐层的,第一二个节点中用过了,如果你接下去看并且写出相应的等式,对应于第三 第四个隐层单元,你就会得到下面的这些等式,我确认一下你弄懂了这些符号,这是向量 w[1]1 w 1 [ 1 ] 这是向量的转置乘以 x x ,那个上标T,表示向量转置,现在就像你可能所猜想的那样,如果你确实在神经网络中执行,用 for 循环来做这些看起来真的很低效,所以接下来我们要做的就是,把这四个等式向量化。

这里写图片描述

so I’m going to start by showing how to compute z as a vector,it turns out you could do it as follows,let me take these WS and stack them into a matrix,then you have W 1 1 transpose,so that say a row vector oh that’s a column vector transpose,gives you a row vector,then W 1 2 transpose W 1 3 transpose and W 1 4 transpose,and so this by stacking those four W vectors together,you end up with a matrix,so another way to think of this is that,we have for Logistic L o g i s t i c regression unions there,and each of the Logistic L o g i s t i c regression unions,have a corresponding parameter vector w,and by stacking those four vectors together,you end up with this four by three matrix,so if you then take this matrix and multiply it,by your input features x1 x2 x3 you end up with,by our matrix multiplication works you end up with,w^[1]_1 transpose x w^[1]_2 transpose X of w^[1]_3 transpose x w^[1]_4 transpose x,and then let’s not forget the bs.

我将展示如何把 z 看做向量计算,结果显示 你可以这么做,让我们把这些 w 堆起来 构成一个矩阵,然后你就有 W[1]1 W 1 [ 1 ] 转置,所以这是这个行向量 是一个列向量的转置,变为一个行向量,然后 W[1]2 W 2 [ 1 ] 转置 W[1]3 W 3 [ 1 ] 转置以及 W[1]4 W 4 [ 1 ] 的转置,把这四个 w 向量堆叠在一起,你会得出一个矩阵,另一个看待这个的方法是,我们有四个 Logistic L o g i s t i c 回归单元,而每一个 Logistic L o g i s t i c 回归单元,都有对应的参数 向量w,把这四个向量堆叠在一起,你会得出这个 4 × 3 的矩阵,然后如果你把这个矩阵,乘以你的输入特征 x1x2x3 x 1 x 2 x 3 你会得出,通过矩阵乘法你可以得出, W[1]1 W 1 [ 1 ] 转置 x x W2[1]转置 x x W3[1]转置 x x W4[1]转置 x x ,然后别忘记了 b。

这里写图片描述

so we now add to this a vector,b1[1] b[1]2 b 2 [ 1 ] b[1]3 b 3 [ 1 ] b[1]4 b 4 [ 1 ] so that they see this,then this is b[1]1 b 1 [ 1 ] b[1]2 b 2 [ 1 ] b[1]3 b 3 [ 1 ] b[1]4 b 4 [ 1 ] ,and so you see that each of the four rows of this outcome,correspond exactly to each of these four rows,of each these four quantities that we had above,so in other words we’ve just shown that,this thing is therefore equal to Z[1]1 Z 1 [ 1 ] Z[1]2 Z 2 [ 1 ] Z[1]3 Z 3 [ 1 ] Z[1]4 Z 4 [ 1 ] ,right as defined here,and maybe not surprisingly we’re going to,call this whole thing the vector Z1,which is taken by stacking up these um individuals of z into a column vector,when we’re vectorizing one of the rules of thumb,that might help you navigate this,is that when we have different nodes in a layer,we stack them vertically.

让我们现在加上一个向量, b[1]1 b 1 [ 1 ] b[1]2 b 2 [ 1 ] b[1]3 b 3 [ 1 ] b[1]4 b 4 [ 1 ] 所以它们看起来这样,然后 b[1]1 b 1 [ 1 ] b[1]2 b 2 [ 1 ] b[1]3 b 3 [ 1 ] b[1]4 b 4 [ 1 ] ,然后你会看到这四行的结果,恰好对应于这四行,对应于上面的这四个等式,换句话说 我们刚刚展示了,这个东西是等于 Z[1]1 Z 1 [ 1 ] Z[1]2 Z 2 [ 1 ] Z[1]3 Z 3 [ 1 ] Z[1]4 Z 4 [ 1 ] ,就如之前在这定义的,这可能并不奇怪 我们将,把这整个东西称作向量 Z[1] Z [ 1 ] ,我们是把单独的z堆叠起来构成一个列向量 Z[1] Z [ 1 ] 的,当我们向量化时一条经验法则,可能帮助你找到方向,就是当我们在一层中有不同的节点,那就纵向地堆叠起来。

so that’s why when you have z^[1]1 to z^[1]4,those correspond to four different nodes in the hidden layer,and so we stack these four numbers vertically,to form the vectors Z1,and to use one more piece of notation,this 4 by 3 matrix here,which we obtained by stacking the lower case you know W11 W12 and so on,we’re going to call this matrix W capital 1,and similarly this vector,we going to call b superscript 1 square bracket,and so this is a 4 by 1 vector,so now we’ve computed Z using this vector matrix notation,the last thing we need to do is also compute these values of a,and so probably won’t surprise you to see,that we’re going to define a1,as just stacking together those activation values a11 to a14,so just take these 4 values and stack them together,in a vector called a^[1],

所以这里有 z[1]1 Z[1]4 z 1 [ 1 ]   Z 4 [ 1 ] ,对应隐层 4 个不同的节点,我们把这四个数竖向堆叠起来,得到向量 Z[1] Z [ 1 ] ,如果用另一种符号惯例来表示,这个 4×3 的矩阵是我们通过,堆叠 W[1]1 W 1 [ 1 ] W[1]2 W 2 [ 1 ] 等等 形成的,我们将这个矩阵称为大写 W[1] W [ 1 ] ,类似的这个矩阵,我们称为 b[1] b [ 1 ] ,所以这是个 4×1 的向量,所以现在我们使用矩阵表示来计算Z,最后一件需要做的事是计算这些 a 的值,所以你应该不会惊讶,我们要把 a[1] a [ 1 ] 定义为, a[1]1 a[1]4 a 1 [ 1 ]   a 4 [ 1 ] 这些激活值的堆叠,所以把这四个值堆叠起来,称为一个向量 a[1] a [ 1 ]

这里写图片描述

and this is going to be sigmoid of z1,where there’s been implementation of the sigmoid function,that takes in the four elements of Z,and applies the sigmoid function element wise to it,so just a recap we figured out that,z1 is equal to W 1 times the vector X plus the vector B1,and a^[1] a 1 is sigmoid of z1,let’s just copy this to the next slide and what we see is that,for the first layer of the neural network given an input X,we have that z[1]=W[1]x+b[1] z [ 1 ] = W [ 1 ] · x + b [ 1 ] ,and a 1is sigmoid of z[1] z [ 1 ] and the dimensions of this are 4 by 1,equals this is a 4 by 3 matrix times a 3 by 1 vector,plus a 4by 1 vector B and this is 4 by 1 same dimensions,and remember that we said x is equal to a 0,right just like y hat is also equal to a 2.

这里就会有 sigmoid(Z[1]) s i g m o i d ( Z [ 1 ] ) ,这里面 它应用, sigmoid s i g m o i d 函数作用于 Z Z 的四个元素,也就相当于把sigmoid函数作用到Z的每个元素,概括一下 我们发现, Z[1] Z [ 1 ] 等于 W[1]×X+b[1] W [ 1 ] × X + b [ 1 ] ,而 a[1]=sigmoid(Z[1]) a [ 1 ] = s i g m o i d ( Z [ 1 ] ) ,让我们把这些复制到下一个幻灯片 可以看到,对于神经网络的第一层 给予一个输入X,我们得出 z[1]=W[1]x+b[1] z [ 1 ] = W [ 1 ] · x + b [ 1 ] ,而 a[1]=sigmoid(Z[1]) a [ 1 ] = s i g m o i d ( Z [ 1 ] ) 而它的维度是 4×1,等于 这是一个 4×3 的矩阵乘以 1 一个 3×1 的向量,加上一个 4×1 的向量b 这个同样是 4×1 的维度,记得我们说过 x等于 a[0] a [ 0 ] ,就像y帽等于 a[2] a [ 2 ] 一样。

so if you want you can actually take this X,and replace it with a 0,since a[0] a [ 0 ] is if you want as an alias for the vector of input feature x,now through a similar derivation you can figure out,that the representation for the next layer,can also be written similarly,well what the output layer does is ,it has associated with it so the parameters W[2] W [ 2 ] and b[2] b [ 2 ] ,so W 2 in this case is going to be a 1 by 4 matrix,and B 2 is just a real number as 1 by 1 and,so Z[2] Z [ 2 ] is going to be a real numbers right as a 1 by 1 matrix,is going to be a 1 by 4 thing times a was 4 by 1 plus B2 is 1 by 1,and so this gives you just a real number,and if you think of this last output unit,as just being analogous to Logistic L o g i s t i c regression,which had parameters W and b,on W really plays analogous role to W[2] W [ 2 ] transpose,or W[2] W [ 2 ] is really W transpose and b is equal to b[2] b [ 2 ] ,right similar to you know,cover up the left of this network and ignore all that for now,then this is just this last output uni ,there’s a lot like Logistic L o g i s t i c regression,except that instead of writing the parameters as W and b,with dimensions one by four and one by one so
,we’re writing them as W[2] W [ 2 ] and b[2] b [ 2 ] ,just a recap for Logistic L o g i s t i c regression,to implement the output or implement prediction,you compute z=wTx+b z = w T x + b ,and a y hat equals a equals sigmoid of z.

如果你确实想把 x,用 a[0] a [ 0 ] 代替,因为 a[0] a [ 0 ] 可以作为输入特征x 这个向量的别名,用同样的方法推导,下一层的表示,可以写成类似的形式,而输出层的作用是,它带参数 W[2] W [ 2 ] b[2] b [ 2 ] ,这里的 W[2] W [ 2 ] 就是一个 1×4 的矩阵,而 b[2] b [ 2 ] 就是一个实数 即 1×1 矩阵,所以 Z[2] Z [ 2 ] 是一个实数 即一个 1×1 的矩阵,这里就是一个 1×4的矩阵乘以 a 4×1 向量 加上 b[2] b [ 2 ] 1×1,这最后得出了一个实数,如果你把这最后的输出单元,看作是 Logistic L o g i s t i c 回归的类似物,它有着参数 W W b W W 实际上是类似于W[2]转置, W[2] W [ 2 ] 其实是 W W 转置就是W b b 则等于b[2],就像你所知道的,把网络左边部分盖住 先忽略这些,那么这最后的输出单元,就像 Logistic L o g i s t i c 回归一样,不过我们不再把参数写成 W W b,而是写成 W[2] W [ 2 ] b[2] b [ 2 ] ,其维度分别为 1×4 和 1×1,归纳一下 对于 Logistic L o g i s t i c 回归,为了计算输出或者说预测,你要计算 z=wTx+b z = w T x + b ,和 y帽 = a = sigmoid(z) s i g m o i d ( z )

这里写图片描述

when you have a neural network who have one hidden layer,what you need to implement to computers output is just the four equations,and you can think of this as a vectorized implementation of computing the output of first these four Logistic L o g i s t i c al regression units in a hidden layer,that’s what this does,and then this Logistic L o g i s t i c regression in the output layer,which is what this does,I hope this description made sense,but takeaway is to compute the output of this neural network,all you need is those four lines of code,so now you’ve seen how given a single input feature vector x,you can with four lines of code compute the outputs of this neural Network,um similar to what we did for Logistic L o g i s t i c regression.

当你有一个单隐层神经网络,你需要去在代码中实现的是,计算这四个等式,且你可以把这看成是,一个向量化的计算过程 计算出这四个,四个隐层中的 Logistic L o g i s t i c 回归单元,这就是这两个等式做的,而这个输出层的 Logistic L o g i s t i c 回归,就是用这两个等式算的,我希望这些描述易于理解,但总的说来要想计算神经网络的输出,你所需要的只是这四行代码,现在 你知道如何输入单个特征向量 x,你可以运用四行代码计算出 这个神经网络的输出,和当时我们处理 Logistic L o g i s t i c 回归时的做法类似。

We will also want to vectorize across multiple training examples,and we’ll see that by stacking up training examples in different column in the matrix,or just slight modification to this,you also similar to what you saw in which is regression,be able to compute the output of this neural network not just on one example at a time,but to your say your entire training set at a time,so let’s see the details of that in the next video.

我们也想把整个训练样本都向量化,我们会发现,通过把不同训练样本堆叠起来构成矩阵,只需稍微修改这些公式,你可以得到类似之前 Logistic L o g i s t i c 回归的结果,能够同时计算出 不止一个样本的神经网络输出,而是能一次性计算你的整个训练集,让我们在下一期中了解这些细节。


重点总结:

计算神经网络的输出

除输入层之外每层的计算输出可由下图总结出:

这里写图片描述

其中,每个结点都对应这两个部分的运算,z 运算和 a 运算。

在编程中,我们使用向量化去计算神经网络的输出:

这里写图片描述

在对应图中的神经网络结构,我们只用 Python 代码去实现右边的四个公式即可实现神经网络的输出计算。

X=a[0] X = a [ 0 ]

z[1]=W[1]×a[0]+b[1] z [ 1 ] = W [ 1 ] × a [ 0 ] + b [ 1 ]

a[1]=σ(z[1]) a [ 1 ] = σ ( z [ 1 ] )

z[2]=W[2]×a[1]+b[2] z [ 2 ] = W [ 2 ] × a [ 1 ] + b [ 2 ]

a[2]=σ(z[2]) a [ 2 ] = σ ( z [ 2 ] )

参考文献:

[1]. 大树先生.吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(1-3)– 浅层神经网络


PS: 欢迎扫码关注公众号:「SelfImprovementLab」!专注「深度学习」,「机器学习」,「人工智能」。以及 「早起」,「阅读」,「运动」,「英语 」「其他」不定期建群 打卡互助活动。

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: Coursera-ml-andrewng-notes-master.zip是一个包含Andrew Ng的机器学习课程笔记和代码的压缩包。这门课程是由斯坦福大学提供的计算机科学和人工智能实验室(CSAIL)的教授Andrew Ng教授开设的,旨在通过深入浅出的方式介绍机器学习的基础概念,包括监督学习、无监督学习、逻辑回归、神经网络等等。 这个压缩包中的笔记和代码可以帮助机器学习初学者更好地理解和应用所学的知识。笔记中包含了课程中涉及到的各种公式、算法和概念的详细解释,同时也包括了编程作业的指导和解答。而代码部分包含了课程中使用的MATLAB代码,以及Python代码的实现。 这个压缩包对机器学习爱好者和学生来说是一个非常有用的资源,能够让他们深入了解机器学习的基础,并掌握如何运用这些知识去解决实际问题。此外,这个压缩包还可以作为教师和讲师的教学资源,帮助他们更好地传授机器学习的知识和技能。 ### 回答2: coursera-ml-andrewng-notes-master.zip 是一个 Coursera Machine Learning 课程的笔记和教材的压缩包,由学生或者讲师编写。这个压缩包中包括了 Andrew Ng 教授在 Coursera 上发布的 Machine Learning 课程的全部讲义、练习题和答案等相关学习材料。 Machine Learning 课程是一个介绍机器学习的课程,它包括了许多重要的机器学习算法和理论,例如线性回归、神经网络、决策树、支持向量机等。这个课程的目标是让学生了解机器学习的方法,学习如何使用机器学习来解决实际问题,并最终构建自己的机器学习系统。 这个压缩包中包含的所有学习材料都是免费的,每个人都可以从 Coursera 的网站上免费获取。通过学习这个课程,你将学习到机器学习的基础知识和核心算法,掌握机器学习的实际应用技巧,以及学会如何处理不同种类的数据和问题。 总之,coursera-ml-andrewng-notes-master.zip 是一个非常有用的学习资源,它可以帮助人们更好地学习、理解和掌握机器学习的知识和技能。无论你是机器学习初学者还是资深的机器学习专家,它都将是一个重要的参考工具。 ### 回答3: coursera-ml-andrewng-notes-master.zip是一份具有高价值的文件,其中包含了Andrew NgCoursera上开授的机器学习课程的笔记。这份课程笔记可以帮助学习者更好地理解掌握机器学习技术和方法,提高在机器学习领域的实践能力。通过这份文件,学习者可以学习到机器学习的算法、原理和应用,其中包括线性回归、逻辑回归、神经网络、支持向量机、聚类、降维等多个内容。同时,这份笔记还提供了很多代码实现和模板,学习者可以通过这些实例来理解、运用和进一步深入研究机器学习技术。 总的来说,coursera-ml-andrewng-notes-master.zip对于想要深入学习和掌握机器学习技术和方法的学习者来说是一份不可多得的资料,对于企业中从事机器学习相关工作的从业人员来说也是进行技能提升或者知识更新的重要资料。因此,对于机器学习领域的学习者和从业人员来说,学习并掌握coursera-ml-andrewng-notes-master.zip所提供的知识和技能是非常有价值的。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值