Coursera | Andrew Ng (01-week-3-3.2)—神经网络表示

最新推荐文章于 2018-02-01 14:34:36 发布

ZJ_Improve

最新推荐文章于 2018-02-01 14:34:36 发布

阅读量475

点赞数

分类专栏：深度学习 | 吴恩达-01.神经网络和深度学习深度学习 | 吴恩达文章标签：深度学习吴恩达网易神经网络

本文链接：https://blog.csdn.net/junjun_zhao/article/details/78977418

版权

深度学习 | 吴恩达同时被 2 个专栏收录

129 篇文章 19 订阅

订阅专栏

深度学习 | 吴恩达-01.神经网络和深度学习

40 篇文章 2 订阅

订阅专栏

该系列仅在原课程基础上部分知识点添加个人学习笔记，或相关推导补充等。如有错误，还请批评指教。在学习了 Andrew Ng 课程的基础上，为了更方便的查阅复习，将其整理成文字。因本人一直在学习英语，所以该系列以英文为主，同时也建议读者以英文为主，中文辅助，以便后期进阶时，为学习相关领域的学术论文做铺垫。- ZJ

Coursera 课程 |deeplearning.ai |网易云课堂

转载请注明作者和出处：ZJ 微信公众号-「SelfImprovementLab」

知乎：https://zhuanlan.zhihu.com/c_147249273

CSDN：http://blog.csdn.net/junjun_zhao/article/details/78977418

3.2 Neural Network Representation （神经网络表示）

(字幕来源：网易云课堂)

这里写图片描述

You’ve seen me draw a few pictures of your neural network in this video.we’ll talk about exactly what those pictures means,in other words exactly what those little neural networks have been drawing on represent,and we’ll starts with focusing on the case of neural networks with what’s called a single hidden layer,here’s a picture of a neural network.let’s give different parts of these pictures some names,we have the input features $x1 ,x2 ,x3$ stacked vertically,and this is called the input layer of the neural network,so maybe not surprisingly,this contains the inputs to the neural network,then there’s another layer of circles,and this is called a hidden layer of the neural network.

之前你见过神经网络的大概图形在本视频中，我们将讨论这些图形的具体含义，换句话说就是我们画的这些神经网络，到底代表什么，我们先集中精力看看，只有一个隐藏层的神经网络，这是一张神经网络图，我们命名一下这张图的各部分，我们有输入特征 $x1 ,x2, x3$ 竖向堆叠起来，这是神经网络的输入层，没什么特别的，它包含了神经网络的输入，然后这里有另外一层的圆圈，我们称之为神经网络的隐藏层。

这里写图片描述

I’ll come back in a second to say what the word hidden means,but the final layer here is formed,by in this case just one node,and this single note layer is called the output layer,and it’s responsible for generating the predicted value y hat.In a neural network the you train with supervised learning,the training set contains values of the inputs x,as well as the target outputs y,so the term hidden layer refers to the fact that,in a training set the true values for these nodes,in the middle are not observed,that is you don’t see what they should be in the training set,you see what the inputs are,you see what the output should be,but the things in a hidden layer,are not seen in the training set,so that kind of explains the name hidden layer just means you don’t see it in a training set.

我们马上会讲到 “隐藏”是什么意思，这里的最后一层，在这里只有一个节点，而这个只带一个节点的层就是输出层，它负责输出预测值 $\hat{y}$ ，在一个神经网络中当你使用监督学习训练它的时候，训练集包含了输入 x，还有目标输出 y，“隐藏层”的含义是，在训练集中这些中间节点的真正数值，我们是不知道的，在训练集你看不到它们的数值，你能看到输入值，也能看见输出值，但是隐藏层中的值，在训练集中是无法看到的，这就是所谓的“隐藏层”，只是表示你无法在训练集中看到。

let’s introduce a bit more notation,whereas previously we were using the vector x to denote the input features,an alternative notation for the values of the input features,will be a superscript square bracket 0,and the term a also stands for activations,and it refers to the values that different layers of the neural network,are passing on to the subsequent layers,so the input layer passes on the value x to the hidden layer,so we’re going to call that,call the activations of the input layer a superscript 0,the next layer the hidden layer,will in turn generate some set of activations,which I’m going to write as a superscript square bracket 1,so in particular this first unit or this first node,we generate the value a superscript square bracket 1 subscript 1.

现在我们再引入几个符号，之前我们用向量 $x$ 表示输入特征，输入特征的数值还有另外一种表示方式，我们用 $a^{[0]}$ 来表示，而这个 $a$ 也表示”激活”的意思，它意味着网络中不同层的值，会传递给后面的层，输入层将 $x$ 的值传递给隐藏层，我们将输入层的激活值，称为 $a$ 上标 $[0]$ ，下一层即隐藏层，也同样会产生一些激活值，那么我将其记作 $a^{[1]}$ ，具体地这里的第一个单元或者说节点，我们将其表示为 $a^{[1]}_1$ 。

这里写图片描述

this second node we generate the value now with a subscript 2 and so on,and so a superscript square bracket 1,this is a four dimensional vector,or if you write it in Python,it gives us a four by one matrix or four column vector,which looks like this and it’s four dimensional,because in this case we have four nodes,or four units or four hidden units in this hidden layer,then finally the output layer,will generate some value $a^{[2]}$ which is just a real number,and so y hat is going to take on the value of $a^{[2]}$ ,so this is analogous to how in $logistic$ regression,we have y hat equals a and in $logistic$ regression,we only had that one output layer,so we didn’t use the superscript square bracket,but with a neural network,we’re now going to use this superscript square bracket,to explicitly indicate which layer it came from.

第二个节点我们记为 $a^{[1]}_2$ 以此类推，这里的 $a^{[1]}$ ，是一个四维向量，写成Python 代码，它是一个 4x1 矩阵或大小为 4 的列向量，就像我画的这样它是四维的，是因为在本例中我们有四个节点，有四个单元四个隐藏层单元，最后的输出层，会产生某个数值 $a^{[2]}$ 是个实数， $\hat{y}$ 的值就等于 $a^{[2]}$ ，所以这和 $logistic$ 回归类似，在 $logistic$ 回归中 $\hat{y}$ 等于 a，我们只有一个输出层，所以没有用带方括号的上标，但是在神经网络中，我们将使用这种方括号上标，来明确地指出这些值来自哪一层。

这里写图片描述

one funny thing about notational conventions in neural networks is that this network that you’re seeing here is called a two layer neural network,and the reason is that when we count layers in neural networks,we don’t count the input layer,so the hidden layer is layer 1,and the output layer is layer two.In our notational convention,we’re calling the input layer layer 0,so technically maybe there are,three layers in this neural network,because this is input layer the hidden layer,and the output layer but in conventional users,if you read research papers and also in the course,you see people refer to this particular neural network,as a two layer neural network,because we don’t count input layer as a as an official layer,finally something that we’ll get to later,is that the hidden layer and the output layers will have parameters associated with it,so the hidden layer will have associated with their parameters w and b,and I’m going to write superscript square bracket 1,to indicate that these are parameters,associated with layer 1 with a hidden layer.

有趣的是在约定俗成的符号中，在这里你看到的这个例子，是所谓的双层神经网络，当我们计算网络的层数时，不算入输入层的原因是，隐藏层是第一层，输出层是第二层。在我们的符号约定中，我们将输入层称为第零层，所以在字面上可以说，这是一个“三层的”神经网络，因为这里有输入层隐藏层，还有输出层但一般符号的约定是，如果你阅读研究论文或者在这门课中，你会看到人们将这个神经网络，称为双层神经网络，因为我们不把输入层看作一个标准的层，最后我们要知道，隐藏层以及最后的输出层，是带有参数的，这里的隐藏层有两个相关的参数 $W$ 和 $b$ ，使用上标 $[1]$ ，表示这些参数，是和第一层这个隐藏层有关的。

这里写图片描述

we’ll see later that W will be a 4 by 3 matrix,and b will be a 4 by 1 vector in this example,where the first coordinate 4,comes from the fact that we have four nodes or four hidden units,there and three comes from the fact,that we have three input features,we’ll talk later about the dimensions of these matrices,and it might make more sense at that time,but in similarly the output layer has associated with it,also parameters w superscript square bracket 2,and b superscript square bracket 2,and it turns out that the dimensions of these,are one by four and one by one, and this one by fours because the hidden layer has four hidden units,the output layer has just one unit,and we’re going we’ll go over the dimensions,of these matrices and vectors in a later video.

之后我们会看到 $W$ 是一个 4x3 的矩阵，而 $b$ 在这个例子中是一个 4x1 向量，第一个数字 4，意思是有四个节点或者说四个隐藏单元，然后数字 3 来自，这里有三个输入特征，之后会更加详细地讨论这些矩阵的维数，到那时你可能就更明白了，类似地输出层也有一些，和它有关的参数 $W^{[2]}$ ，以及 $b^{[2]}$ ，从维数来看，分别是 1x4 以及1x1，这是 1×4 是因为隐藏层有四个隐藏单元，而输出层只有一个单元，之后的视频中我们会对这些矩阵和向量的维度，做更加深入的解释。

这里写图片描述

so you’ve just seen what a two layer neural network looks like,that is a neural network with one hidden layer,in the next video let’s go deeper into exactly,what this neural network is computing,that is how this neural network inputs x,and goes all the way to computing this output y hat.

所以你现在已经知道一个双层神经网络是怎样的，它是只有一个隐藏层的神经网络，在下一个视频中我们将更深入地，了解这个神经网络到底在计算什么，这个神经网络是怎么输入 x，然后又是怎么一直算下去得到 $\hat{y}$ 的。

重点总结：

简单神经网络示意图：

这里写图片描述

神经网络基本的结构和符号可以从上面的图中看出，这里不再复述。

主要需要注意的一点，是层与层之间参数矩阵的规格大小：

输入层和隐藏层之间

$w^{[1]}−>(4,3)$ ：前面的 4 是隐层神经元的个数，后面的 3 是输入层神经元的个数；
$b^{[1]}−>(4,1)$ ：和隐藏层的神经元个数相同；
隐藏层和输出层之间

$w^{[2]}−>(1,4)$ ：前面的 1 是输出层神经元的个数，后面的 4 是隐层神经元的个数；
$b^{[2]}−>(1,1)$ ：和输出层的神经元个数相同；

由上面我们可以总结出，在神经网络中，我们以相邻两层为观测对象，前面一层作为输入，后面一层作为输出，两层之间的 w 参数矩阵大小为 $(n_{out},n_{in})$ ，b 参数矩阵大小为 $(n_{out},1)$ ，这里是作为 $z=wX+b$ 的线性关系来说明的，在神经网络中， $w^{[i]}=w^{T}$ 。

在 Logistic regression 中，一般我们都会用 $(n_{in},n_{out})$ 来表示参数大小，计算使用的公式为： $z = w^{T}X+b$ ，要注意这两者的区别。

参考文献：

[1]. 大树先生.吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记（1-3）– 浅层神经网络

PS: 欢迎扫码关注公众号：「SelfImprovementLab」！专注「深度学习」，「机器学习」，「人工智能」。以及「早起」，「阅读」，「运动」，「英语」「其他」不定期建群打卡互助活动。

ZJ_Improve

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Coursera | Andrew Ng (01-week-3-3.2)—神经网络表示

该系列仅在原课程基础上部分知识点添加个人学习笔记，或相关推导补充等。如有错误，还请批评指教。在学习了 Andrew Ng 课程的基础上，为了更方便的查阅复习，将其整理成文字。因本人一直在学习英语，所以该系列以英文为主，同时也建议读者以英文为主，中文辅助，以便后期进阶时，为学习相关领域的学术论文做铺垫。- ZJ Coursera 课程 |deeplearning.ai |网易云课堂
复制链接

扫一扫

专栏目录