Coursera | Andrew Ng (01-week-3-3.3)—计算神经网络的输出

最新推荐文章于 2022-06-16 08:43:52 发布

ZJ_Improve

最新推荐文章于 2022-06-16 08:43:52 发布

阅读量385

点赞数 1

分类专栏：深度学习 | 吴恩达-01.神经网络和深度学习深度学习 | 吴恩达文章标签：吴恩达深度学习神经网络网易

本文链接：https://blog.csdn.net/JUNJUN_ZHAO/article/details/78977757

版权

深度学习 | 吴恩达同时被 2 个专栏收录

129 篇文章 19 订阅

订阅专栏

深度学习 | 吴恩达-01.神经网络和深度学习

40 篇文章 2 订阅

订阅专栏

该系列仅在原课程基础上部分知识点添加个人学习笔记，或相关推导补充等。如有错误，还请批评指教。在学习了 Andrew Ng 课程的基础上，为了更方便的查阅复习，将其整理成文字。因本人一直在学习英语，所以该系列以英文为主，同时也建议读者以英文为主，中文辅助，以便后期进阶时，为学习相关领域的学术论文做铺垫。- ZJ

Coursera 课程 |deeplearning.ai |网易云课堂

转载请注明作者和出处：ZJ 微信公众号-「SelfImprovementLab」

知乎：https://zhuanlan.zhihu.com/c_147249273

CSDN：http://blog.csdn.net/junjun_zhao/article/details/78977757

3.3 Computing a Neural Networks’s Output （计算神经网络的输出）

(字幕来源：网易云课堂)

这里写图片描述

In the last video you saw what a single hidden layer neural network looks like,in this video let’s go through the details of exactly how this neural network computers outputs,what you see is that is like $Logistic$ regression,but repeat of all the times.let’s take a look so this is what’s a two layer neural network.let’s go more deeply into exactly what this neural network compute,now we’ve said before that $Logistic$ regression,the circle images the regression really represents two steps of computation,first you compute z as follows,and in second you compute the activation as a sigmoid function of z,so a neural network just does this a lot more times.let’s start by focusing on just one of the nodes in the hidden layer,and this look at the first node in the hidden layer.

在上一期的视频中我们已经见过，单隐层神经网络长什么样，在这期的视频中让我们了解，神经网络的输出究竟是如何计算出来的，你所看到的是像 $Logistic$ 那样的运算过程，但整个运算过程会重复很多遍，看一下这是一个两层的神经网络，让我们更深入地了解神经网络到底在计算什么，我们之前说过 $Logistic$ 回归，这里的圆圈代表了回归计算的两个步骤，首先你按步骤计算出 z，然后在第二步计算激活函数，就是函数 $sigmoid(z)$ ，所以神经网络只不过重复计算这些步骤很多次，我们先来看，隐层的其中一个节点，看看这个隐层的第一个节点。

so I’ve grayed out 隐去 the other nodes for now,so similar to $Logistic$ regression on the left,this node in a hidden layer does two steps of computation,right the first step we think it’s as the left half of this node,it computes Z equals W transpose X plus B,and the notation we’ll use is um,these are all quantities associated with the first hidden layer,so that’s why we have a bunch of square brackets there,and this is the first node in the hidden layer,so that’s why we have the subscript one over there,so first it does that,and then a second step is it computes a11 equals sigmoid of z11 like so,so for both z and a the notational convention is that,a Li the L here in superscript square brackets,refers to layer number,and the i subscript here refers to the nodes in that layer.

我暂时先隐去其他的节点，左边看上去和 $Logistic$ 回归很相似，隐层的这个节点进行两步计算，第一步我们可以看成是节点的左边，计算 $z = w^T x + b$ ，这些我们会用到的标记，这些都是和第一隐层有关的量，所以才用了那么多 [1] 上标，这是隐层的第一个节点，所以我们有个下标 1，第一步就是这样，然后第二步是计算， $a^{[1]}_1 = sigmoid(Z^{[1]}_1)$ 就像这样，所以对于 $z$ 和 $a$ 按符号约定写成，写成 $a^{[l]}_i$ ，这里上标方括号表示层数，而下标 i 则表示层中的第几个节点。

这里写图片描述

so the node we will be looking at is layer 1 that is a hidden layer node 1 ,so that’s why the superscript and subscript were on both 1 1,so that little circle that first node in neural network,represents carrying out these two steps of computation,now let’s look at the second node in neural network,the second node in the hidden layer of in neural network,similar to the $Logistic$ regression unit on the left,this little circle represents two steps of computation,the first step is compute z,this is still layer 1 but now the second node,equals W transpose x plus b^[1]_2,and then a 12 equals Sigma z12,and again feel free to pause the video if you want,but you can double check that,the superscript and subscript notation,is consistent with what we have written here above in purple,so we’ve talk through the first two hidden units in the neural network on hidden units three and four,also represents some computations.

我们看的是第一隐层的第一个节点，所以有上标和下标都是 1 1，所以这个小圆圈即神经网络的第一个节点，表示执行这两步计算，现在让我们看看神经网络的第二个节点，即神经网络中隐层的第二个节点，与左边的 $Logistic$ 回归单元类似，这个小圆圈代表了计算的两个步骤，第一步是计算 z，这还是在第一层但变成第二个节点了，等于 $w^T x + b^{[1]}_2$ ，然后 $a^{[1]}_2 = sigmoid(Z^{[1]}_2)$ ，再次需要的话可以暂停视频仔细看看，这样你就可以再次查看，标记的上标和下标，和我们上面所写的是保持一致的，所以我们已经讨论了神经网络的前两个隐层单元，第三四个隐藏单元，也表示同样的计算。

这里写图片描述

so now let me take this pair of equations,and this pair of equations,and let’s copy them to the next slide,so here’s our network,and here’s the first and there’s a second equations,they were worked on previously for,the first and the second hidden units,if you then go through and write out the corresponding equations,for the third and fourth hidden units,you get the following,and let’s make sure this notation is clear,this is the vector W^[1]_1 this is a vector transpose times x ok,so that’s what the superscript T there,represents this is a vector transpose,now as you might have guessed,if you’re actually implementing in neural network doing this,with a for loop seems really inefficient,so what we’re going to do is,take these four equations and vectorize.

现在让我们把这对等式，还有这对等式，把它们复制到下一个幻灯片中，这是我们的神经网络，这是第一个等式这是第二个等式，它们之前已经在隐层的，第一二个节点中用过了，如果你接下去看并且写出相应的等式，对应于第三第四个隐层单元，你就会得到下面的这些等式，我确认一下你弄懂了这些符号，这是向量 $w^{[1]}_1$ 这是向量的转置乘以 $x$ ，那个上标 $T$ ，表示向量转置，现在就像你可能所猜想的那样，如果你确实在神经网络中执行，用 for 循环来做这些看起来真的很低效，所以接下来我们要做的就是，把这四个等式向量化。

这里写图片描述

so I’m going to start by showing how to compute z as a vector,it turns out you could do it as follows,let me take these WS and stack them into a matrix,then you have W 1 1 transpose,so that say a row vector oh that’s a column vector transpose,gives you a row vector,then W 1 2 transpose W 1 3 transpose and W 1 4 transpose,and so this by stacking those four W vectors together,you end up with a matrix,so another way to think of this is that,we have for $Logistic$ regression unions there,and each of the $Logistic$ regression unions,have a corresponding parameter vector w,and by stacking those four vectors together,you end up with this four by three matrix,so if you then take this matrix and multiply it,by your input features x1 x2 x3 you end up with,by our matrix multiplication works you end up with,w^[1]_1 transpose x w^[1]_2 transpose X of w^[1]_3 transpose x w^[1]_4 transpose x,and then let’s not forget the bs.

我将展示如何把 z 看做向量计算，结果显示你可以这么做，让我们把这些 w 堆起来构成一个矩阵，然后你就有 $W^{[1]}_1$ 转置，所以这是这个行向量是一个列向量的转置，变为一个行向量，然后 $W^{[1]}_2$ 转置 $W^{[1]}_3$ 转置以及 $W^{[1]}_4$ 的转置，把这四个 w 向量堆叠在一起，你会得出一个矩阵，另一个看待这个的方法是，我们有四个 $Logistic$ 回归单元，而每一个 $Logistic$ 回归单元，都有对应的参数向量w，把这四个向量堆叠在一起，你会得出这个 4 × 3 的矩阵，然后如果你把这个矩阵，乘以你的输入特征 $x1 x2 x3$ 你会得出，通过矩阵乘法你可以得出， $W^{[1]}_1$ 转置 $x$ $W^{[1]}_2$ 转置 $x$ $W^{[1]}_3$ 转置 $x$ $W^{[1]}_4$ 转置 $x$ ，然后别忘记了 b。

这里写图片描述

so we now add to this a vector, $b^{[1]}_1$ $b^{[1]}_2$ $b^{[1]}_3$ $b^{[1]}_4$ so that they see this,then this is $b^{[1]}_1$ $b^{[1]}_2$ $b^{[1]}_3$ $b^{[1]}_4$ ,and so you see that each of the four rows of this outcome,correspond exactly to each of these four rows,of each these four quantities that we had above,so in other words we’ve just shown that,this thing is therefore equal to $Z^{[1]}_1$ $Z^{[1]}_2$ $Z^{[1]}_3$ $Z^{[1]}_4$ ,right as defined here,and maybe not surprisingly we’re going to,call this whole thing the vector Z1,which is taken by stacking up these um individuals of z into a column vector,when we’re vectorizing one of the rules of thumb,that might help you navigate this,is that when we have different nodes in a layer,we stack them vertically.

让我们现在加上一个向量， $b^{[1]}_1$ $b^{[1]}_2$ $b^{[1]}_3$ $b^{[1]}_4$ 所以它们看起来这样，然后 $b^{[1]}_1$ $b^{[1]}_2$ $b^{[1]}_3$ $b^{[1]}_4$ ，然后你会看到这四行的结果，恰好对应于这四行，对应于上面的这四个等式，换句话说我们刚刚展示了，这个东西是等于 $Z^{[1]}_1$ $Z^{[1]}_2$ $Z^{[1]}_3$ $Z^{[1]}_4$ ，就如之前在这定义的，这可能并不奇怪我们将，把这整个东西称作向量 $Z^{[1]}$ ，我们是把单独的z堆叠起来构成一个列向量 $Z^{[1]}$ 的，当我们向量化时一条经验法则，可能帮助你找到方向，就是当我们在一层中有不同的节点，那就纵向地堆叠起来。

so that’s why when you have z^[1]1 to z^[1]4,those correspond to four different nodes in the hidden layer,and so we stack these four numbers vertically,to form the vectors Z1,and to use one more piece of notation,this 4 by 3 matrix here,which we obtained by stacking the lower case you know W11 W12 and so on,we’re going to call this matrix W capital 1,and similarly this vector,we going to call b superscript 1 square bracket,and so this is a 4 by 1 vector,so now we’ve computed Z using this vector matrix notation,the last thing we need to do is also compute these values of a,and so probably won’t surprise you to see,that we’re going to define a1,as just stacking together those activation values a11 to a14,so just take these 4 values and stack them together,in a vector called a^[1],

所以这里有 $z^{[1]}_1~Z^{[1]}_4$ ，对应隐层 4 个不同的节点，我们把这四个数竖向堆叠起来，得到向量 $Z^{[1]}$ ，如果用另一种符号惯例来表示，这个 4×3 的矩阵是我们通过，堆叠 $W^{[1]}_1$ $W^{[1]}_2$ 等等形成的，我们将这个矩阵称为大写 $W^{[1]}$ ，类似的这个矩阵，我们称为 $b^{[1]}$ ，所以这是个 4×1 的向量，所以现在我们使用矩阵表示来计算Z，最后一件需要做的事是计算这些 a 的值，所以你应该不会惊讶，我们要把 $a[1]$ 定义为， $a^{[1]}_1~a^{[1]}_4$ 这些激活值的堆叠，所以把这四个值堆叠起来，称为一个向量 $a^{[1]}$ 。

这里写图片描述

and this is going to be sigmoid of z1,where there’s been implementation of the sigmoid function,that takes in the four elements of Z,and applies the sigmoid function element wise to it,so just a recap we figured out that,z1 is equal to W 1 times the vector X plus the vector B1,and a^[1] a 1 is sigmoid of z1,let’s just copy this to the next slide and what we see is that,for the first layer of the neural network given an input X,we have that $z^{[1]} = W^{[1]} ·x + b^[1]$ ,and a 1is sigmoid of $z^{[1]}$ and the dimensions of this are 4 by 1,equals this is a 4 by 3 matrix times a 3 by 1 vector,plus a 4by 1 vector B and this is 4 by 1 same dimensions,and remember that we said x is equal to a 0,right just like y hat is also equal to a 2.

这里就会有 $sigmoid(Z^{[1]})$ ，这里面它应用， $sigmoid$ 函数作用于 $Z$ 的四个元素，也就相当于把 $sigmoid$ 函数作用到Z的每个元素，概括一下我们发现， $Z^{[1]}$ 等于 $W^{[1]} × X + b^{[1]}$ ，而 $a[1] = sigmoid(Z^{[1]})$ ，让我们把这些复制到下一个幻灯片可以看到，对于神经网络的第一层给予一个输入X，我们得出 $z^{[1]} = W^{[1]}·x + b^[1]$ ，而 $a[1] = sigmoid(Z^{[1]})$ 而它的维度是 4×1，等于这是一个 4×3 的矩阵乘以 1 一个 3×1 的向量，加上一个 4×1 的向量b 这个同样是 4×1 的维度，记得我们说过 x等于 $a[0]$ ，就像y帽等于 $a{[2]}$ 一样。

so if you want you can actually take this X,and replace it with a 0,since $a^{[0]}$ is if you want as an alias for the vector of input feature x,now through a similar derivation you can figure out,that the representation for the next layer,can also be written similarly,well what the output layer does is ,it has associated with it so the parameters $W^{[2]}$ and $b^{[2]}$ ,so W 2 in this case is going to be a 1 by 4 matrix,and B 2 is just a real number as 1 by 1 and,so $Z^{[2]}$ is going to be a real numbers right as a 1 by 1 matrix,is going to be a 1 by 4 thing times a was 4 by 1 plus B2 is 1 by 1,and so this gives you just a real number,and if you think of this last output unit,as just being analogous to $Logistic$ regression,which had parameters W and b,on W really plays analogous role to $W^{[2]}$ transpose,or $W^{[2]}$ is really W transpose and b is equal to $b^{[2]}$ ,right similar to you know,cover up the left of this network and ignore all that for now,then this is just this last output uni ,there’s a lot like $Logistic$ regression,except that instead of writing the parameters as W and b,with dimensions one by four and one by one so
,we’re writing them as $W^{[2]}$ and $b^{[2]}$ ,just a recap for $Logistic$ regression,to implement the output or implement prediction,you compute $z = w^T x + b$ ,and a y hat equals a equals sigmoid of z.

如果你确实想把 x，用 $a^{[0]}$ 代替，因为 $a^{[0]}$ 可以作为输入特征x 这个向量的别名，用同样的方法推导，下一层的表示，可以写成类似的形式，而输出层的作用是，它带参数 $W^{[2]}$ $b^{[2]}$ ，这里的 $W^{[2]}$ 就是一个 1×4 的矩阵，而 $b^{[2]}$ 就是一个实数即 1×1 矩阵，所以 $Z^{[2]}$ 是一个实数即一个 1×1 的矩阵，这里就是一个 1×4的矩阵乘以 a 4×1 向量加上 $b^{[2]}$ 1×1，这最后得出了一个实数，如果你把这最后的输出单元，看作是 $Logistic$ 回归的类似物，它有着参数 $W$ 和 $b$ ， $W$ 实际上是类似于 $W^{[2]}$ 转置， $W^{[2]}$ 其实是 $W$ 转置就是 $W$ $b$ 则等于 $b^{[2]}$ ，就像你所知道的，把网络左边部分盖住先忽略这些，那么这最后的输出单元，就像 $Logistic$ 回归一样，不过我们不再把参数写成 $W$ 和 $b$ ，而是写成 $W^{[2]}$ 和 $b^{[2]}$ ，其维度分别为 1×4 和 1×1，归纳一下对于 $Logistic$ 回归，为了计算输出或者说预测，你要计算 $z = w^T x + b$ ，和 y帽 = a = $sigmoid(z)$ 。

这里写图片描述

when you have a neural network who have one hidden layer,what you need to implement to computers output is just the four equations,and you can think of this as a vectorized implementation of computing the output of first these four $Logistic$ al regression units in a hidden layer,that’s what this does,and then this $Logistic$ regression in the output layer,which is what this does,I hope this description made sense,but takeaway is to compute the output of this neural network,all you need is those four lines of code,so now you’ve seen how given a single input feature vector x,you can with four lines of code compute the outputs of this neural Network,um similar to what we did for $Logistic$ regression.

当你有一个单隐层神经网络，你需要去在代码中实现的是，计算这四个等式，且你可以把这看成是，一个向量化的计算过程计算出这四个，四个隐层中的 $Logistic$ 回归单元，这就是这两个等式做的，而这个输出层的 $Logistic$ 回归，就是用这两个等式算的，我希望这些描述易于理解，但总的说来要想计算神经网络的输出，你所需要的只是这四行代码，现在你知道如何输入单个特征向量 x，你可以运用四行代码计算出这个神经网络的输出，和当时我们处理 $Logistic$ 回归时的做法类似。

We will also want to vectorize across multiple training examples,and we’ll see that by stacking up training examples in different column in the matrix,or just slight modification to this,you also similar to what you saw in which is regression,be able to compute the output of this neural network not just on one example at a time,but to your say your entire training set at a time,so let’s see the details of that in the next video.

我们也想把整个训练样本都向量化，我们会发现，通过把不同训练样本堆叠起来构成矩阵，只需稍微修改这些公式，你可以得到类似之前 $Logistic$ 回归的结果，能够同时计算出不止一个样本的神经网络输出，而是能一次性计算你的整个训练集，让我们在下一期中了解这些细节。

重点总结：

计算神经网络的输出

除输入层之外每层的计算输出可由下图总结出：

这里写图片描述

其中，每个结点都对应这两个部分的运算，z 运算和 a 运算。

在编程中，我们使用向量化去计算神经网络的输出：

这里写图片描述

在对应图中的神经网络结构，我们只用 Python 代码去实现右边的四个公式即可实现神经网络的输出计算。

$X =a^{[0]}$

$z^{[1]}= W^{[1]} × a^{[0]} + b^{[1]}$

$a[1] = \sigma (z^{[1]})$

$z^{[2]}= W^{[2]} × a^{[1]} + b^{[2]}$

$a[2] = \sigma (z^{[2]})$

参考文献：

[1]. 大树先生.吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记（1-3）– 浅层神经网络

PS: 欢迎扫码关注公众号：「SelfImprovementLab」！专注「深度学习」，「机器学习」，「人工智能」。以及「早起」，「阅读」，「运动」，「英语」「其他」不定期建群打卡互助活动。

ZJ_Improve

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Coursera | Andrew Ng (01-week-3-3.3)—计算神经网络的输出

该系列仅在原课程基础上部分知识点添加个人学习笔记，或相关推导补充等。如有错误，还请批评指教。在学习了 Andrew Ng 课程的基础上，为了更方便的查阅复习，将其整理成文字。因本人一直在学习英语，所以该系列以英文为主，同时也建议读者以英文为主，中文辅助，以便后期进阶时，为学习相关领域的学术论文做铺垫。- ZJ Coursera 课程 |deeplearning.ai |网易云课堂
复制链接

扫一扫