Coursera | Andrew Ng (01-week-3-3.4)—多个例子中的向量化

最新推荐文章于 2024-01-13 10:08:31 发布

ZJ_Improve

最新推荐文章于 2024-01-13 10:08:31 发布

阅读量508

点赞数

分类专栏：深度学习 | 吴恩达-01.神经网络和深度学习深度学习 | 吴恩达文章标签：深度学习吴恩达 coursera

本文链接：https://blog.csdn.net/JUNJUN_ZHAO/article/details/78978738

版权

深度学习 | 吴恩达同时被 2 个专栏收录

129 篇文章 19 订阅

订阅专栏

深度学习 | 吴恩达-01.神经网络和深度学习

40 篇文章 2 订阅

订阅专栏

该系列仅在原课程基础上部分知识点添加个人学习笔记，或相关推导补充等。如有错误，还请批评指教。在学习了 Andrew Ng 课程的基础上，为了更方便的查阅复习，将其整理成文字。因本人一直在学习英语，所以该系列以英文为主，同时也建议读者以英文为主，中文辅助，以便后期进阶时，为学习相关领域的学术论文做铺垫。- ZJ

Coursera 课程 |deeplearning.ai |网易云课堂

转载请注明作者和出处：ZJ 微信公众号-「SelfImprovementLab」

知乎：https://zhuanlan.zhihu.com/c_147249273

CSDN：http://blog.csdn.net/junjun_zhao/article/details/78978738

3.4 Vectorizing across multiple examples （多个例子中的向量化）

(字幕来源：网易云课堂)

这里写图片描述

In the last video, you saw how to compute a prediction on a neural network given a single training example.In this video, you see how to vectorize across multiple training examples,and the outcome will be quite similar to what you saw for $Logistic$ regression,whereby stacking up different training examples in different columns of a matrix,you’ll be able to take the equations you had from the previous video,and with very little modification,change them to make the neural network compute the outputs on all the examples,pretty much all at the same time.So let’s see the details of how to do that.These were the four equations we had from the previous video ofhow you compute $z^{[1]}$ , $A^{[1]}$ , $Z^{[2]}$ and $A^{[2]}$ .

在上一个视频中你们看到了如何在，已知单个训练样本时计算神经网络的预测，在这个视频中你们可以看到如何将不同训练样本向量化，输出结果和 $Logistic$ 回归很相似，如何将不同训练样本堆叠起来放入矩阵的各列呢?，你可以把上一个视频中的方程拿过来，然后稍微修改一下，把输入方式变一下让神经网络几乎同时，计算所有样本的输出，我们看看具体怎么做，这些是上一个视频的 4 个方程，如何计算 $z^{[1]}$ , $A^{[1]}$ , $Z^{[2]}$ 和 $A^{[2]}$ 。

这里写图片描述

And they tell you how given an input feature vector x,you can use them to generate $A^{[2]}$ equals $\hat{y}$ for a single training example.Now, if you have m training examples,you need to repeat this process for, say,the first training example x_superscript_round_brackets_1 to compute $\hat{y}^{[1]}$ ,that’s the prediction on your first training example.Then $x^{[2]}$ , use that to generate prediction $\hat{y}^{[2]}$ ,and so on down to x^[m] to generate prediction $\hat{y}^{[m]}$ .And so, in order to write this with the activation function notation as well,I’m going to write this as $A^{[2]}$ _square_brackets_round_bracket_1.This is 2 and $A^{[2]}_{(m)}$ .So this notation, a_square_bracket_2_round_bracket_i,the round bracket_i refers to training example iand the square bracket_2 refers to layer two.

它们告诉你，对于输入的特征向量 x，对于这单个训练样本你可以用它们生成一个 $A^{[2]} = \hat{y}$ ，现在如果你有 m 个训练样本，你可能需要重复这个过程比如说，第一个训练样本 $x^{[1]}$ 来计算 $\hat{y}^{[1]}$ ，那是对你第一个训练样本的预测，然后 $x^{[2]}$ 用来生成预测 $\hat{y}^{[2]}$ ，之类的一直到 $x^{[m]}$ 生成预测 $\hat{y}^{[m]}$ ，所以要用激活函数来表示这些式子，我要把它写成 $A^{[2]}_{(1)}$ ，这是 $A^{[2]}_{(2)}$ 和 $A^{[2]}_{(m)}$ ，所以这个符号 $A^{[2]}_{(i)}$ ，圆括号里的 i 表示指训练样本 i，方括号指的是第二层。

这里写图片描述

So that’s how the square bracket and the round bracket indices work.And so this suggests that if you have an unvectorized implementationand want to compute the prediction of all your training examplesyou need to do for i equals 1 to m,then basically implement these 4 equations.because $z^{[1]}_{(i)}$ equals $W^{[1]} x^{(i)} + b^{[1]}$ $A^{[1]}_{(i)}$ equals sigmoid of $z^{[1]}_{(i)}$ $Z^{[2]}$ (i) equals $W^{[2]} A^{[1]}_{(i)}+ b^{[2]}$ .and $A^{[2]}_{(i)}$ equals sigmoid of $Z^{[2]}_{(i)}$ Right, so basically these four equations on topby adding the superscript round bracket ito all the variables that depend on the training example,so adding the superscript round bracket i to x, z and a,if you want to compute,all the outputs on your m training examples.

这就是方括号圆括号写法的意义，所以这表明如果你有一个没有向量化的实现，并想要计算所有训练样本的预测，你需要对 i=1 到 m 遍历，然后基本实现这4个方程，因为 $z^{[1]}_{(i)}$ 等于 $W^{[1]} x^{(i)} + b^{[1]}$ ， $A^{[1]}_{(i)}$ 等于 $σ(z^{[1]}_{(i)})$ ， $Z^{[2]}_{(i)}$ 等于 $W^{[2]} A^{[1]}_{(i)}+ b^{[2]}$ ，然后 $A^{[2]}_{(i)}$ 等于 $σ(Z^{[2]}_{(i)})$ ，对所以基本上和上面四个方程一样，不过需要添加上标圆括号 i，来表示训练样本中的所有变量，所以要往 $x$ $z$ 和 $a$ 加上圆括号上标 i，如果你想计算，所有 m 个训练样本的输出。

What we like to do is vectorize this whole computation so as to get rid of this formula.And by the way, in case it seems like I’m getting a lot of nitty gritty linear algebra,it turns out that being able to implement this correctlyis important in the deep learning era,and we actually chose the notation very carefully for this course and made these vectorizations as easy as possible.So I hope that going through this nitty gritty will actually help you to more quickly get correct implementations of these algorithms working.All right, so let me just copy this whole block of code to the next slide and then we’ll see how to vectorize this.So here’s what we have from the previous slide with a for loop going over all m training examples.

我们一般喜欢将整个计算向量化就可以去掉这些公式，顺便说一句如果你觉得我讲的太多深奥的线性代数，事实上能够正确实现这些算法，在深度学习时代很重要，在备课时我很注意符号的选择，使得这些向量化过程越简单越好，所以我希望详细讲解线性代数细节能够帮你，快速正确地实现这些算法，好的我们把这段代码复制到下一张幻灯片里，然后看看怎么向量化这个，所以这是我们从上一张幻灯片中的，一个 for 循环遍历所有的 m 训练样本。

这里写图片描述

So recall that we define the matrix X to be equal to our training examples stacked up in these columns like so.So take the training examples, stack them in columns,so this becomes a n or maybe n_x by m dimensional matrix.I’m just going to give away the punchline and tell you what you need to implement in order to have a vectorized implementation of this for loop.Turns out what you need to do is compute $Z^{[1]}$ equals $W^{[1]}X + b^{[1]}$ , $A^{[1]}$ equals sigmoid of $Z^{[1]}$ ,then $Z^{[2]}$ equals $W^{[2]}$ times $A^{[1]}$ + $b^{[2]}$ ,and then $A^{[2]}$ equals sigmoid of $Z^{[2]}$ .So if you want, the analogy is that we went from lowercase vector x’s to this capital case X matrix by stacking up the lowercase x’s in different columns.

还记得我们定义过矩阵X，就是我们的训练样本堆到各列，所以把这些训练样本拿过来堆到各列里，所以这也许就能做成一个 n 或者 $n_x$ 乘 m 维的矩阵，我直接把关键地方抖出来了告诉你需要实现什么，才能把这个 for 循环变成向量化实现，实际上你要计算的是 $Z^{[1]}$ 等于 $W^{[1]} X + b^{[1]}$ ， $A^{[1]}$ 等于 $σ(Z^{[1]})$ ，然后 $Z^{[2]}$ 就等于 $W^{[2]}$ 乘以 $A^{[1]} + b^{[2]}$ ，然后 $A^{[2]}$ 等于 $σ(Z^{[2]})$ ，所以如果你想的话这就好比把小 $x$ 向量，堆叠到矩阵各列构成大 $X$ 矩阵。

So if you do the same thing for the z’s,so for example, if you take $z^{[1]}_{(1)}$ ， $z^{[1]}_{(2)}$ and so on and these are all columns vectors up to $z^{[1]}_{(m)}$ , right,so that’s this first quantity,but all m of them and stack them in columns,then this gives you the matrix $Z^{[1]}$ .And similarly, if you look at say this quantity and take $A^{[1]}_{(1)}$ ， $A^{[1]}_{(2)}$ and so on and $A^{[1]}_{(m)}$ and stack them up in columns,then this, just as we went from lowercase x’s to capital case X and lowercase z to capital case Z,this goes from the lower case a, which are vectors,to this capital $A^{[1]}$ over there.

这里写图片描述

所以对于 $z$ 你也可以做同样的事情，比如说你可以取 $z^{[1]}_{(1)}$ ， $z^{[1]}_{(2)}$ 等等这些列向量一直排到 $z^{[1]}_{(m)}$ ，所以这是第一个量，但全部 m 个向量都以列向量堆叠起来，这样就得到了矩阵 $Z^{[1]}$ ，同样如果你看这个量并且取 $A^{[1]}_{(1)}$ ， $A^{[1]}_{(2)}$ 等等一直到 $A^{[1]}_{(m)}$ 将它们以列向量堆叠起来，这过程就和小 $x$ 到大 $X$ 的过程一样，小 $z$ 到大 $Z$ 过程一样，这样就从小 $a$ 向量，变成那边的大 $A^{[1]}$ 矩阵。

And similarly, for $Z^{[2]}$ and $A^{[2]}$ ,they’re also obtained by taking these vectors and stacking them horizontally and then taking these vectors and stacking them horizontally in order to get capital $Z^{[2]}$ and capital $A^{[2]}$ .One of the property of this notation that might help you to think about it is that these matrices say Z and A,horizontally we’re going to index across training examples,so that’s why the horizontal index corresponds to different training examples.As you sweep from left to right, you’re scanning through the training set.And vertically, this vertical index corresponds to different nodes in the neural network.So for example, this node,this value at the top leftmost corner of the matrix corresponds to the activation of the first hidden unit on the first training example,one value down corresponds to the activation in the second hidden unit on the first training example,and the third hidden unit on the first training example and so on.

同样对于 $Z^{[2]}$ 和 $A^{[2]}$ ，也是通过，将这些向量横向堆叠起来然后，再把这些向量横向堆叠起来，就得到大写 $Z^{[2]}$ 和大写 $A^{[2]}$ 矩阵，这种写法可以帮助你们去想，这些矩阵比如说 $Z$ 和 $A$ ，横向的话我们有对所有训练样本用指标排序，所以横向指标就对应了不同的训练样本，当你从左到右扫的时候就扫过了整个训练集，而在竖向竖向指标就对应了神经网络里的不同节点，所以例如这个节点，这个值位于矩阵的最左上角，对应第一个训练样本第一个隐藏单元的激活函数，下面一个值就对应了，第二个隐藏单元对第一个训练样本的激活函数，还有第三个隐藏单元对第一个训练样本以此类推。

So as you scan down,this is you indexing into the hidden unit’s number,whereas if you move horizontally,then you’re going from the first hidden unit in the first training example to now the first hidden unit in the second training example,the third training example and so on until this node here corresponds to the activation of the first hidden unit in the final training example in the mth training example.So the horizontal, the matrix A goes over our different training examples,and vertically, the different indices in the matrix A.And a similar intuition holds true for the matrix Z as well as was for X,where horizontally corresponds to different training examples and vertically corresponds to different input features,which are really different nodes in the input layer of the neural network.

所以当你扫下来时，这是隐藏单元的指标，如果你往横向移动的话，就从第一个训练样本的第一个隐藏单元，移动到第二个训练样本的第一个隐藏单元，再移动到第三个训练样本等等，直到这里的节点对应最后第 $m$ 个训练样本，第一个隐藏单元的激活函数为止，所以横向矩阵 $A$ 会扫过不同的训练样本，竖向是矩阵A中的不同指标，同样的形式也适用于矩阵 $Z$ 和矩阵 $X$ ，横向对应的是不同训练样本，竖向对应不同的输入特征，这其实是神经网络输入层的不同节点。

So, with these equations you know know how to implement a neural network with vectorization that is the vectorization across multiple examples.In the next video, I’m gonna show you more justification about why this is a correct implementation of this type of vectorization.It turns out that justification will be similar to what you had seen for $Logistic$ s regression.Let’s go on to the next video

所以通过这些方程你就知道如何实现，把不同样本向量化的神经网络算法，在下一个视频中我要给你们讲更多的理由说明，为什么这是向量化的正确实现，事实证明这些理由，和 $Logistic$ 回归中见到的理由是很类似的，我们继续看下一个视频。

$z^{[1]}_{(i)}= W^{[1]} x^{(i)} + b^{[1]}$

$A^{[1]}_{(i)}=σ(z^{[1]}_{(i)})$

$Z^{[2]}_{(i)}= W^{[2]} A^{[1]}_{(i)}+ b^{[2]}$

$A^{[2]}_{(i)}=σ(Z^{[2]}_{(i)})$

PS: 欢迎扫码关注公众号：「SelfImprovementLab」！专注「深度学习」，「机器学习」，「人工智能」。以及「早起」，「阅读」，「运动」，「英语」「其他」不定期建群打卡互助活动。

ZJ_Improve

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Coursera | Andrew Ng (01-week-3-3.4)—多个例子中的向量化

该系列仅在原课程基础上部分知识点添加个人学习笔记，或相关推导补充等。如有错误，还请批评指教。在学习了 Andrew Ng 课程的基础上，为了更方便的查阅复习，将其整理成文字。因本人一直在学习英语，所以该系列以英文为主，同时也建议读者以英文为主，中文辅助，以便后期进阶时，为学习相关领域的学术论文做铺垫。- ZJ Coursera 课程 |deeplearning.ai |网易云课堂
复制链接

扫一扫

专栏目录