Coursera | Andrew Ng (01-week-4-4.5)—为什么使用深层表示

最新推荐文章于 2021-04-26 16:23:21 发布

ZJ_Improve

最新推荐文章于 2021-04-26 16:23:21 发布

阅读量565

点赞数

分类专栏：深度学习 | 吴恩达-01.神经网络和深度学习深度学习 | 吴恩达文章标签：吴恩达深度学习 coursera

本文链接：https://blog.csdn.net/JUNJUN_ZHAO/article/details/79034219

版权

深度学习 | 吴恩达同时被 2 个专栏收录

129 篇文章 19 订阅

订阅专栏

深度学习 | 吴恩达-01.神经网络和深度学习

40 篇文章 2 订阅

订阅专栏

该系列仅在原课程基础上部分知识点添加个人学习笔记，或相关推导补充等。如有错误，还请批评指教。在学习了 Andrew Ng 课程的基础上，为了更方便的查阅复习，将其整理成文字。因本人一直在学习英语，所以该系列以英文为主，同时也建议读者以英文为主，中文辅助，以便后期进阶时，为学习相关领域的学术论文做铺垫。- ZJ

Coursera 课程 |deeplearning.ai |网易云课堂

转载请注明作者和出处：ZJ 微信公众号-「SelfImprovementLab」

知乎：https://zhuanlan.zhihu.com/c_147249273

CSDN：http://blog.csdn.net/junjun_zhao/article/details/79034219

4.5 Why deep representations? 为什么使用深层表示

(字幕来源：网易云课堂)

这里写图片描述

we’ve all been hearing that deep neural networks work really well for a lot of problems.it’s not just that they need to be big neural networks is that specifically they need to be deep,or to have a lot of hidden layers,so why is that.let’s go to a couple examples and try to gain some intuition,for why deep networks might work well,so first what is there deep network computing,if you’re building a system for face recognition or face detection,here’s what the deep neural network could be doing,perhaps you input a picture of a face,then the first layer of the neural network you can think of as maybe being a feature detector or an edge detector.

我们都知道深度神经网络，能解决好多问题，其实并不需要很大的神经网络，但是得有深度，得有比较多的隐藏层，这是为啥呢，我们一起来看几个例子来帮助理解，为什么深度神经网络会很好用，首先深度网络究竟在计算什么，如果你在建一个人脸识别或是人脸检测系统，深度神经网络所做的事就是，当你输入一张脸部的照片，然后你可以把深度神经网络的第一层，当成一个特征探测器或者边缘探测器。

In this example I’m plotting what a neural network with maybe twenty hidden units,might be trying to compute on this image with the twenty hidden units visualized by these little square boxes.so for example this little visualization represents a hidden unit.that’s trying to figure out if you know,where the edges of that orientation are in the image,and maybe this hidden unit,might be trying to figure out where are the horizontal edges in this image,and when we talk about convolutional networks in a later course of this particular visualization.we’ll make a bit more sense,but informally you can think of the first layer in neural network,as look on a picture and trying to figure out,you know where the edges in this picture.now let’s figured out where the edges in this picture by grouping together pixels to form edges,it can then take the detected edges,and group edges together to form parts of faces.

这里写图片描述

在这个例子里我会建一个，大概有 20 个隐藏单元的深度神经网络，是怎么针对这张图计算的，隐藏单元就是这些图里这些小方块，举个例子这个小方块就是一个隐藏单元，它会去找这张照片里，边缘的方向，那么这个隐藏单元，可能是在找水平向的，边缘在哪里，之后的课程里，我们会讲专门做这种识别的卷积神经网络，到时候会细讲为什么小单元是这么表示的，你可以先把神经网络的第一层，当作看图然后去找，这张照片的各个边缘，我们可以把照片里组成边缘的，像素们放在一起看，然后它可以把被探测到的边缘，组合成面部的不同部分。

so for example you might have a little neuron try to see it is finding an eye,or a different neuron trying to find that part of the nose,and so by putting together lots of edges.it can start to detect different parts of faces,and finally by putting together on different parts of faces,like a nose or an eye an ear or chin.it can then try to recognize or detect different types of faces.so intuitively you can think of the earlier layers of the neural network is detecting simpler functions like edges,and then composing them together in the later layers of a neural network,so that they can learn one more complex functions.

比如说可能有一个神经元，会去找眼睛的部分，另外还有别的在找鼻子的部分，然后把这许多的边缘结合在一起，就可以开始检测人脸的不同部分，最后再把这些部分放在一起，比如鼻子眼睛下巴，就可以识别或是探测不同的人脸啦，你可以直觉上把这种神经网络的前几层，当做探测简单的函数比如边缘，之后把它们跟后几层结合在一起，那么总体上就能学习更多复杂的函数。

这里写图片描述

these visualizations will make more sense,when we talk about convolutional nets,and one technical detail of this visualization,the edge detectors are looking in relatively small areas of an image,may be very small regions like that,and then the facial detectors,you can look at may be much larger areas in the image,but the main intuition when you take away,from this is just finding simple things like edges,and then building them up composing them together to detect more complex things like an eye or a nose,and the composing those together to find even more complex things,and this type of simple to complex hierarchical representation,or compositional representation applies in other types of data than images and face recognition as well.

这些图的意义，我们在学习卷积神经网络的时候再深入了解，还有一个技术性的细节需要理解的是，边缘探测器其实相对来说都是针对照片中非常小块的面积，就像这块都是很小的区域，面部探测器呢，就会针对于大一些的区域，但是主要的概念是，一般你会从比较小的细节入手比如边缘，然后再一步步到更大更复杂的区域，比如一只眼睛或是一个鼻子，再把眼睛鼻子装一块组成更复杂的部分，这种从简单到复杂的金字塔状表示方法，或者组成方法，也可以应用在图像或者人脸识别以外的其他数据上。

for example if you’re trying to build a speech recognition system is how to visualize speech but if you the input an audio clip,then maybe the first level of a neural network might learn to detect.you know low level audio waveform features,such as is this tone going up this is going down is it a white noise or sibilant sound lights right,and what is the pitch,but you can detect take low level waveform features like that,and then by composing low level waveforms,maybe your learn to detect basic units of sound,so in linguistics they called phonemes,but for example in the word cat the cup is a phoneme,the up ciseaux means that tub is another phoneme,but learns to find with the basic units of sound and,then composing that together,maybe you’re going to recognize words in the audio,and then you can compose those together in order to recognize the entire you know phrases or sentences.

比如当你想要建一个语音识别系统的时候，需要解决的就是如何可视化语音 比如你输入一个音频片段，那么神经网络的第一层可能就会去先开始试着探测，比较低层次的音频波形的一些特征，比如音调是变高了还是低了，分辨白噪音啦咝咝咝的声音啦，或者音调啦，可以选择这些相对程度比较低的波形特征，然后把这些波形组合在一起，就能去探测声音的基本单元，在语言学中有个概念叫做音位，比如说单词 cat c 的发音嗑就是一个音位，a 的发音是个音位 t 的发音特也是个音位，有了基本的声音单元以后，组合起来，你就能识别音频当中的单词，单词再组合起来就能识别词组，再到完整的句子。

这里写图片描述

so deep neural network with multiple hidden layers might be able to have the earlier layers learn these low levels simpler features,and then have the later deeper layers,then put together the simpler things that’s detected,in order to detect more complex things,like recognize specific words or even phrases,or sentences that you’re uttering.in order to carry out speech recognition,and what we see is that whereas the earlier layers are computing,what seems like relatively simple functions of the input,such as where are the edges,by the time you get deep in the network you can actually do,you know surprisingly complex things,such as detect faces or detect words or phrases or sentences,some people like to make an analogy between deep neural networks and the human brain,where we believe um neuroscientists believe that,the human brain also starts off detecting simple things,like edges in what your eye see,and it builds those up to detect more complex things,like the faces that you see.

所以深度神经网络的这许多隐层中，较早的前几层能学习一些低层次的简单特征，等到后几层，就能把简单的特征结合起来，去探测更加复杂的东西，比如你录在音频里的单词词组，或是句子然后就能，运行语音识别了，同时我们所计算的之前的几层，也就是相对简单的输入函数，比如图像单元的边缘啥的，到网络中的深层时你实际上就能做，很多复杂的事，比如探测面部或是探测单词短语或是句子，有些人喜欢把深度神经网络，和人类大脑做类比，这些神经科学家觉得人的大脑，也是先探测简单的东西，比如你眼睛看得到的边缘，然后组合起来才能探测复杂的物体，比如脸。

I think analogies between deep learning,and the human brain are sometimes a little bit dangerous,but you know there is a lot of truth,to this being how we think the human brain works,and that the human brain probably detects simple things like edges first,and then puts them together to form more and more complex objects,and so that has served as a loose form of inspiration for some deep learning as well,we’ll say a bit more about the human brain or about the biological brain in the later video this week.

这种深度学习，很人类大脑的比较有时候比较危险，但是不可否认的是，我们对大脑运作机制的认识很有价值，有可能大脑就是先从简单的东西，比如边缘着手，再组合成一个完整的复杂物体，这类简单到复杂的过程，同样也是其他一些深度学习的灵感来源，之后的视频我们也会继续，聊聊人类或是生物学理解的大脑。

the other piece of intuition about why neural network seem to work well is the following.so this result comes from the circuit theory,which pertains to thinking about what types of functions you can compute with different and gates and or gates,and not gates basically logic gates.so informally these functions you can compute with be relatively small but deep neural network,and by small I mean the number of hidden units is relatively small,but that if you try to compute the same function with a shallow network.so we aren’t allowed enough hidden layers,then you might require exponentially more hidden units to compute.

另外一个，关于神经网络为何有效的理论，来源于电路理论，它和你能够用电路元件计算哪些函数有着分不开的联系，根据不同的基本逻辑门，譬如与门或门非门，在非正式的情况下这些函数都，可以用相对较小但很深的神经网络来计算，小在这里的意思是隐藏单元的数量相对比较小，但是如果你用浅一些的神经网络计算同样的函数，也就是说在我们不能用很多隐藏层时，你会需要成指数增长的单元数量才能达到同样的计算结果。

这里写图片描述

so let me just give you one example and illustrate this a bit informally,but let’s say you’re trying to compute the exclusive-or,or the parity of all your input features,you can compute x1 XOR x2 XOR x3 XOR up to xn,and if you have n or n_x features.so if you build an XOR tree like this right,so first compute the XOR of x1 x2,then take x3 and x4 and compute their XOR,and technically if you’re just using ands or not gate,you might need a couple layers,to compute the XOR function rather than just one layer,but with a relatively small circuit,you can compute the XOR right and so on,and then you can you know build really an XOR tree like so,and so eventually you have a circuit here that outputs you know the all.let’s call this y that outputs,y hat equals y the exclusive or the parity of all of these input bits.so to compute the XOR,the depth of the network will be on the order of log n right this type of XOR tree.so the number of nodes and the number of circuit circuit components,or the number of gates in this network is not that large,you don’t need that many gates in order to compute the exclusive-or,but now if you’re not allowed to use a neural network,with multiple hidden layers with in this case order log and hidden layers.

我再来举个例子用没那么正式的语言介绍这个概念，假设你想要对输入特征计算异或，或是奇偶性，你可以算 x1 XOR x2 XOR x3 XOR直到 xn，假设你有n或者 n_x 个特征，如果你画一个异或的树图，先要计算 x1 x2 的异或，然后是 x3 和 x4，技术上来说如果你只用或门还有非门的话，你可能会需要几层，才能计算异或函数，但是用相对小的电路，你应该就可以计算异或了，然后你可以继续建这样的一个异或树图，那么你最后会得到这样的电路，来输出结果y，y帽等于y 也就是输入特征的异或或是奇偶性，要计算异或关系，这种树图对应网络的深度应该是 O(log(n))，那么节点的数量和电路部件，或是门的数量并不会很大，你也不需要太多门去计算异或，但是如果你不能使用，多隐层的神经网络的话在这个例子中隐层数为 O(log n)。

这里写图片描述

if you’re forced to compute this function with just one hidden layer right.so you have all these things going into you know.so let’s hidden units and then these things then outputs y,then in order to compute the parity of XOR to compute this XOR function,this hidden layer will need to be exponentially large,because essentially you need to exhaustively enumerate or two to the N possible configurations,or or the order of two to the N possible configurations of the input bits,that result in the exclusive or being either 1 or 0.so you end up needing a hidden layer that is exponentially large in the number of bits.I think technically you could do this with 2 to the N minus 1 hidden units right but that’s the order to the end,also exponentially large in the number of bit.so I hope this gives a sense that there are mathematical functions,that are much easier to compute with deep networks than with shallow networks.I have to admit that I personally found the result from circuit theory,less useful for gaining intuitions,but this is one of the results,that people often cite when just,when explaining the value of having very deep representations.

比如你被迫只能，用单隐层来计算的话，这里全部都指向，从这些隐藏单元到后面这里再输出y，那么要计算奇偶性或者异或关系函数，就需要这一隐层的单元数呈指数增长才行，因为本质上来说你需要，列举耗尽 2 的 n 次方种可能的配置，或是 2^{n} 种输入比特的配置，异或运算的最终结果是 1 或 0，那么你最终就会需要，一个隐藏层其中单元数目随输入比特指数上升，精确的说应该是，2^{(n-1)} 个隐藏单元数，也就是 O(2^{n})，我希望这能让你有点概念意识到有很多数学函数，用深度网络计算，比浅网络要容易得多，我个人倒是认为这种电路理论，对训练直觉思维没那么有用，但这个结果，人们还是经常提到的，用来解释为什么需要更深层的网络。

这里写图片描述

now in addition to these reasons for preferring deep neural networks.to be perfectly honest I think,the other reason the term term deep learning,has taken off it’s just branding right,these things used to be called neural networks with a lot of hidden layers,but the phrase deep learning you know it’s just a great brand,it just is so deep right,so I think that once that term caught on that,really neural networks rebranded,or neural networks with many hidden layers rebranded,helped to capture the popular imagination as well,but regardless of the PR branding on deep networks do work .well,sometimes people go overboard and insist on using tons of hidden layers,but when I’m starting out on a new problem.I’ll often really start out with even logistic regressions,and try something with one or two hidden layers,and use that as a hyper parameter use that,as a parameter or hyper parameter that you tune,in order to try to find the right depth for your neural network,but over the last several years,there has been a trend toward people finding that for some applications very very deep neural networks,here with maybe many dozens of layers sometimes can sometimes be the best model for a problem.so that’s it so the intuitions for why deep learning seems to work well.let’s now take a look at the mechanics of how to implement not just forward propagation but also back propagation.

除了这些原因，跟你掏心窝子说大实话，我认为深度学习，这个名字挺唬人的，这些概念以前都统称为有很多隐层的神经网络，但是深度学习听起来多高大上呀，太深奥了你说对不，这个词流传出去以后，这是神经网络的重新包装，或是多隐层神经网络的重新包装，激发了大众的想象力，抛开这些公关概念重新包装不谈深度网络确实效果不错，有时候人们还是会按照字面意思钻牛角尖非要用巨多隐层，但是当我开始解决一个新问题时，我通常会从logistic回归开始，再试试一到两个隐层，把隐层数量，当做参数超参数一样去调试，这样去找比较合适的深度，但是近几年以来，有一些人会趋向于，使用非常非常深邃的神经网络，比如好几打的层数，某些问题中只有这种网络才是最佳模型，这就是我想讲的为什么深度学习效果拔群的直觉解释，现在我们来看看除了正向传播以外，反向传播该怎么具体实现。

重点总结：

为什么使用深层表示

人脸识别和语音识别：

这里写图片描述

对于人脸识别，神经网络的第一层从原始图片中提取人脸的轮廓和边缘，每个神经元学习到不同边缘的信息；网络的第二层将第一层学得的边缘信息组合起来，形成人脸的一些局部的特征，例如眼睛、嘴巴等；后面的几层逐步将上一层的特征组合起来，形成人脸的模样。随着神经网络层数的增加，特征也从原来的边缘逐步扩展为人脸的整体，由整体到局部，由简单到复杂。层数越多，那么模型学习的效果也就越精确。

对于语音识别，第一层神经网络可以学习到语言发音的一些音调，后面更深层次的网络可以检测到基本的音素，再到单词信息，逐渐加深可以学到短语、句子。

所以从上面的两个例子可以看出随着神经网络的深度加深，模型能学习到更加复杂的问题，功能也更加强大。

电路逻辑计算：

这里写图片描述

假定计算异或逻辑输出：