Coursera | Andrew Ng (01-week-4-4.5)—为什么使用深层表示

该系列仅在原课程基础上部分知识点添加个人学习笔记,或相关推导补充等。如有错误,还请批评指教。在学习了 Andrew Ng 课程的基础上,为了更方便的查阅复习,将其整理成文字。因本人一直在学习英语,所以该系列以英文为主,同时也建议读者以英文为主,中文辅助,以便后期进阶时,为学习相关领域的学术论文做铺垫。- ZJ

Coursera 课程 |deeplearning.ai |网易云课堂


转载请注明作者和出处:ZJ 微信公众号-「SelfImprovementLab」

知乎https://zhuanlan.zhihu.com/c_147249273

CSDNhttp://blog.csdn.net/junjun_zhao/article/details/79034219


4.5 Why deep representations? 为什么使用深层表示

(字幕来源:网易云课堂)

这里写图片描述

we’ve all been hearing that deep neural networks work really well for a lot of problems.it’s not just that they need to be big neural networks is that specifically they need to be deep,or to have a lot of hidden layers,so why is that.let’s go to a couple examples and try to gain some intuition,for why deep networks might work well,so first what is there deep network computing,if you’re building a system for face recognition or face detection,here’s what the deep neural network could be doing,perhaps you input a picture of a face,then the first layer of the neural network you can think of as maybe being a feature detector or an edge detector.

我们都知道深度神经网络,能解决好多问题,其实并不需要很大的神经网络,但是得有深度,得有比较多的隐藏层,这是为啥呢,我们一起来看几个例子来帮助理解,为什么深度神经网络会很好用首先 深度网络究竟在计算什么,如果你在建一个人脸识别 或是人脸检测系统,深度神经网络所做的事就是,当你输入一张脸部的照片,然后你可以把深度神经网络的第一层,当成一个特征探测器 或者边缘探测器。

In this example I’m plotting what a neural network with maybe twenty hidden units,might be trying to compute on this image with the twenty hidden units visualized by these little square boxes.so for example this little visualization represents a hidden unit.that’s trying to figure out if you know,where the edges of that orientation are in the image,and maybe this hidden unit,might be trying to figure out where are the horizontal edges in this image,and when we talk about convolutional networks in a later course of this particular visualization.we’ll make a bit more sense,but informally you can think of the first layer in neural network,as look on a picture and trying to figure out,you know where the edges in this picture.now let’s figured out where the edges in this picture by grouping together pixels to form edges,it can then take the detected edges,and group edges together to form parts of faces.

这里写图片描述

在这个例子里 我会建一个,大概有 20 个隐藏单元的深度神经网络,是怎么针对这张图计算的,隐藏单元就是这些图里这些小方块,举个例子 这个小方块就是一个隐藏单元,它会去找这张照片里,边缘的方向,那么这个隐藏单元,可能是在找水平向的,边缘在哪里,之后的课程里,我们会讲专门做这种识别的卷积神经网络,到时候会细讲 为什么小单元是这么表示的,你可以先把神经网络的第一层,当作看图 然后去找,这张照片的各个边缘,我们可以把照片里组成边缘的,像素们放在一起看,然后它可以把被探测到的边缘,组合成面部的不同部分。

so for example you might have a little neuron try to see it is finding an eye,or a different neuron trying to find that part of the nose,and so by putting together lots of edges.it can start to detect different parts of faces,and finally by putting together on different parts of faces,like a nose or an eye an ear or chin.it can then try to recognize or detect different types of faces.so intuitively you can think of the earlier layers of the neural network is detecting simpler functions like edges,and then composing them together in the later layers of a neural network,so that they can learn one more complex functions.

比如说 可能有一个神经元,会去找眼睛的部分,另外还有别的在找鼻子的部分,然后把这许多的边缘结合在一起,就可以开始检测人脸的不同部分,最后再把这些部分放在一起,比如鼻子眼睛下巴,就可以识别或是探测不同的人脸啦,你可以直觉上把这种神经网络的前几层,当做探测简单的函数 比如边缘,之后把它们跟后几层结合在一起,那么总体上就能学习更多复杂的函数。

这里写图片描述

these visualizations will make more sense,when we talk about convolutional nets,and one technical detail of this visualization,the edge detectors are looking in relatively small areas of an image,may be very small regions like that,and then the facial detectors,you can look at may be much larger areas in the image,but the main intuition when you take away,from this is just finding simple things like edges,and then building them up composing them together to detect more complex things like an eye or a nose,and the composing those together to find even more complex things,and this type of simple to complex hierarchical representation,or compositional representation applies in other types of data than images and face recognition as well.

这些图的意义,我们在学习卷积神经网络的时候再深入了解,还有一个技术性的细节需要理解的是,边缘探测器其实相对来说都是针对照片中非常小块的面积,就像这块 都是很小的区域,面部探测器呢,就会针对于大一些的区域,但是主要的概念是,一般你会从比较小的细节入手 比如边缘,然后再一步步到更大更复杂的区域,比如一只眼睛或是一个鼻子,再把眼睛鼻子装一块 组成更复杂的部分,这种从简单到复杂的金字塔状表示方法,或者组成方法,也可以应用在图像或者人脸识别以外的其他数据上。

for example if you’re trying to build a speech recognition system is how to visualize speech but if you the input an audio clip,then maybe the first level of a neural network might learn to detect.you know low level audio waveform features,such as is this tone going up this is going down is it a white noise or sibilant sound lights right,and what is the pitch,but you can detect take low level waveform features like that,and then by composing low level waveforms,maybe your learn to detect basic units of sound,so in linguistics they called phonemes,but for example in the word cat the cup is a phoneme,the up ciseaux means that tub is another phoneme,but learns to find with the basic units of sound and,then composing that together,maybe you’re going to recognize words in the audio,and then you can compose those together in order to recognize the entire you know phrases or sentences.

比如当你想要建一个语音识别系统的时候,需要解决的就是如何可视化语音 比如你输入一个音频片段,那么神经网络的第一层可能就会去先开始试着探测,比较低层次的音频波形的一些特征,比如音调是变高了还是低了,分辨白噪音啦 咝咝咝的声音啦,或者音调啦,可以选择这些相对程度比较低的波形特征,然后把这些波形组合在一起,就能去探测声音的基本单元,在语言学中有个概念叫做音位,比如说单词 cat c 的发音 嗑 就是一个音位,a 的发音是个音位 t 的发音 特 也是个音位,有了基本的声音单元以后,组合起来,你就能识别音频当中的单词,单词再组合起来就能识别词组,再到完整的句子。

这里写图片描述

so deep neural network with multiple hidden layers might be able to have the earlier layers learn these low levels simpler features,and then have the later deeper layers,then put together the simpler things that’s detected,in order to detect more complex things,like recognize specific words or even phrases,or sentences that you’re uttering.in order to carry out speech recognition,and what we see is that whereas the earlier layers are computing,what seems like relatively simple functions of the input,such as where are the edges,by the time you get deep in the network you can actually do,you know surprisingly complex things,such as detect faces or detect words or phrases or sentences,some people like to make an analogy between deep neural networks and the human brain,where we believe um neuroscientists believe that,the human brain also starts off detecting simple things,like edges in what your eye see,and it builds those up to detect more complex things,like the faces that you see.

所以深度神经网络的这许多隐层中,较早的前几层能学习一些低层次的简单特征等到后几层,就能把简单的特征结合起来,去探测更加复杂的东西,比如你录在音频里的单词 词组,或是句子 然后就能,运行语音识别了,同时我们所计算的之前的几层,也就是相对简单的输入函数,比如图像单元的边缘啥的,到网络中的深层时 你实际上就能做,很多复杂的事,比如探测面部 或是探测单词 短语 或是句子,有些人喜欢把深度神经网络,和人类大脑做类比,这些神经科学家觉得人的大脑,也是先探测简单的东西,比如你眼睛看得到的边缘,然后组合起来才能探测复杂的物体,比如脸。

I think analogies between deep learning,and the human brain are sometimes a little bit dangerous,but you know there is a lot of truth,to this being how we think the human brain works,and that the human brain probably detects simple things like edges first,and then puts them together to form more and more complex objects,and so that has served as a loose form of inspiration for some deep learning as well,we’ll say a bit more about the human brain or about the biological brain in the later video this week.

这种深度学习,很人类大脑的比较 有时候比较危险,但是不可否认的是,我们对大脑运作机制的认识很有价值,有可能大脑就是先从简单的东西,比如边缘着手,再组合成一个完整的复杂物体,这类简单到复杂的过程,同样也是其他一些深度学习的灵感来源,之后的视频我们也会继续,聊聊人类或是生物学理解的大脑。

the other piece of intuition about why neural network seem to work well is the following.so this result comes from the circuit theory,which pertains to thinking about what types of functions you can compute with different and gates and or gates,and not gates basically logic gates.so informally these functions you can compute with be relatively small but deep neural network,and by small I mean the number of hidden units is relatively small,but that if you try to compute the same function with a shallow network.so we aren’t allowed enough hidden layers,then you might require exponentially more hidden units to compute.

另外一个,关于神经网络为何有效的理论,来源于电路理论,它和你能够用电路元件计算哪些函数 有着分不开的联系,根据不同的基本逻辑门,譬如与门 或门 非门,在非正式的情况下 这些函数都,可以用相对较小 但很深的神经网络来计算,小在这里的意思是 隐藏单元的数量相对比较小,但是如果你用浅一些的神经网络计算同样的函数,也就是说在我们不能用很多隐藏层时,你会需要成指数增长的单元数量才能达到同样的计算结果。

这里写图片描述

so let me just give you one example and illustrate this a bit informally,but let’s say you’re trying to compute the exclusive-or,or the parity of all your input features,you can compute x1 XOR x2 XOR x3 XOR up to xn,and if you have n or n_x features.so if you build an XOR tree like this right,so first compute the XOR of x1 x2,then take x3 and x4 and compute their XOR,and technically if you’re just using ands or not gate,you might need a couple layers,to compute the XOR function rather than just one layer,but with a relatively small circuit,you can compute the XOR right and so on,and then you can you know build really an XOR tree like so,and so eventually you have a circuit here that outputs you know the all.let’s call this y that outputs,y hat equals y the exclusive or the parity of all of these input bits.so to compute the XOR,the depth of the network will be on the order of log n right this type of XOR tree.so the number of nodes and the number of circuit circuit components,or the number of gates in this network is not that large,you don’t need that many gates in order to compute the exclusive-or,but now if you’re not allowed to use a neural network,with multiple hidden layers with in this case order log and hidden layers.

我再来举个例子 用没那么正式的语言介绍这个概念,假设你想要对输入特征计算异或,或是奇偶性,你可以算 x1 XOR x2 XOR x3 XOR直到 xn,假设你有n或者 n_x 个特征,如果你画一个异或的树图,先要计算 x1 x2 的异或,然后是 x3 和 x4,技术上来说如果你只用或门 还有非门的话,你可能会需要几层,才能计算异或函数,但是用相对小的电路,你应该就可以计算异或了,然后你可以继续建这样的一个异或树图,那么你最后会得到这样的电路,来输出结果y,y帽等于y 也就是输入特征的异或 或是奇偶性,要计算异或关系,这种树图对应网络的深度应该是 O(log(n)),那么节点的数量和电路部件,或是门的数量并不会很大,你也不需要太多门去计算异或,但是如果你不能使用,多隐层的神经网络的话 在这个例子中隐层数为 O(log n)。

这里写图片描述

if you’re forced to compute this function with just one hidden layer right.so you have all these things going into you know.so let’s hidden units and then these things then outputs y,then in order to compute the parity of XOR to compute this XOR function,this hidden layer will need to be exponentially large,because essentially you need to exhaustively enumerate or two to the N possible configurations,or or the order of two to the N possible configurations of the input bits,that result in the exclusive or being either 1 or 0.so you end up needing a hidden layer that is exponentially large in the number of bits.I think technically you could do this with 2 to the N minus 1 hidden units right but that’s the order to the end,also exponentially large in the number of bit.so I hope this gives a sense that there are mathematical functions,that are much easier to compute with deep networks than with shallow networks.I have to admit that I personally found the result from circuit theory,less useful for gaining intuitions,but this is one of the results,that people often cite when just,when explaining the value of having very deep representations.

比如你被迫只能,用单隐层来计算的话,这里全部都指向,从这些隐藏单元到后面这里 再输出y,那么要计算奇偶性 或者异或关系函数,就需要这一隐层的单元数呈指数增长才行,因为本质上来说你需要,列举耗尽 2 的 n 次方种可能的配置,或是 2^{n} 种输入比特的配置,异或运算的最终结果是 1 或 0,那么你最终就会需要,一个隐藏层 其中单元数目随输入比特指数上升,精确的说应该是,2^{(n-1)} 个隐藏单元数,也就是 O(2^{n}),我希望 这能让你有点概念 意识到有很多数学函数,用深度网络计算,比浅网络要容易得多,我个人倒是认为这种电路理论,对训练直觉思维没那么有用,但这个结果,人们还是经常提到的,用来解释为什么需要更深层的网络。

这里写图片描述

now in addition to these reasons for preferring deep neural networks.to be perfectly honest I think,the other reason the term term deep learning,has taken off it’s just branding right,these things used to be called neural networks with a lot of hidden layers,but the phrase deep learning you know it’s just a great brand,it just is so deep right,so I think that once that term caught on that,really neural networks rebranded,or neural networks with many hidden layers rebranded,helped to capture the popular imagination as well,but regardless of the PR branding on deep networks do work .well,sometimes people go overboard and insist on using tons of hidden layers,but when I’m starting out on a new problem.I’ll often really start out with even logistic regressions,and try something with one or two hidden layers,and use that as a hyper parameter use that,as a parameter or hyper parameter that you tune,in order to try to find the right depth for your neural network,but over the last several years,there has been a trend toward people finding that for some applications very very deep neural networks,here with maybe many dozens of layers sometimes can sometimes be the best model for a problem.so that’s it so the intuitions for why deep learning seems to work well.let’s now take a look at the mechanics of how to implement not just forward propagation but also back propagation.

除了这些原因,跟你掏心窝子说大实话,我认为深度学习,这个名字挺唬人的,这些概念以前都统称为 有很多隐层的神经网络,但是深度学习听起来多高大上呀,太深奥了 你说对不,这个词流传出去以后,这是神经网络的重新包装,或是多隐层神经网络的重新包装,激发了大众的想象力,抛开这些公关概念重新包装不谈 深度网络确实效果不错,有时候人们还是会按照字面意思钻牛角尖 非要用巨多隐层,但是当我开始解决一个新问题时,我通常会从logistic回归开始,再试试一到两个隐层,把隐层数量,当做参数 超参数一样去调试,这样去找比较合适的深度,但是近几年以来,有一些人会趋向于,使用非常非常深邃的神经网络,比如好几打的层数,某些问题中只有这种网络才是最佳模型,这就是我想讲的 为什么深度学习效果拔群的直觉解释,现在我们来看看除了正向传播以外,反向传播该怎么具体实现。


重点总结:

为什么使用深层表示

人脸识别和语音识别:

这里写图片描述

对于人脸识别,神经网络的第一层从原始图片中提取人脸的轮廓和边缘,每个神经元学习到不同边缘的信息;网络的第二层将第一层学得的边缘信息组合起来,形成人脸的一些局部的特征,例如眼睛、嘴巴等;后面的几层逐步将上一层的特征组合起来,形成人脸的模样。随着神经网络层数的增加,特征也从原来的边缘逐步扩展为人脸的整体,由整体到局部,由简单到复杂。层数越多,那么模型学习的效果也就越精确。

对于语音识别,第一层神经网络可以学习到语言发音的一些音调,后面更深层次的网络可以检测到基本的音素,再到单词信息,逐渐加深可以学到短语、句子。

所以从上面的两个例子可以看出随着神经网络的深度加深,模型能学习到更加复杂的问题,功能也更加强大。

电路逻辑计算:

这里写图片描述

假定计算异或逻辑输出:

y=x1x2x3xn

对于该运算,若果使用深度神经网络,每层将前一层的相邻的两单元进行异或,最后到一个输出,此时整个网络的层数为一个树形的形状,网络的深度为 O(log2(n)) ,共使用的神经元的个数为:

1+2++2log2(n)1=112log2(n)12=2log2(n)1=n1

即输入个数为 n,输出个数为 n-1。

但是如果不适用深层网络,仅仅使用单隐层的网络(如右图所示),需要的神经元个数为 2n1 个 。同样的问题,但是深层网络要比浅层网络所需要的神经元个数要少得多。

参考文献:

[1]. 大树先生.吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(1-4)– 浅层神经网络


PS: 欢迎扫码关注公众号:「SelfImprovementLab」!专注「深度学习」,「机器学习」,「人工智能」。以及 「早起」,「阅读」,「运动」,「英语 」「其他」不定期建群 打卡互助活动。

### 回答1: Coursera-ml-andrewng-notes-master.zip是一个包含Andrew Ng的机器学习课程笔记和代码的压缩包。这门课程是由斯坦福大学提供的计算机科学和人工智能实验室(CSAIL)的教授Andrew Ng教授开设的,旨在通过深入浅出的方式介绍机器学习的基础概念,包括监督学习、无监督学习、逻辑回归、神经网络等等。 这个压缩包中的笔记和代码可以帮助机器学习初学者更好地理解和应用所学的知识。笔记中包含了课程中涉及到的各种公式、算法和概念的详细解释,同时也包括了编程作业的指导和解答。而代码部分包含了课程中使用的MATLAB代码,以及Python代码的实现。 这个压缩包对机器学习爱好者和学生来说是一个非常有用的资源,能够让他们深入了解机器学习的基础,并掌握如何运用这些知识去解决实际问题。此外,这个压缩包还可以作为教师和讲师的教学资源,帮助他们更好地传授机器学习的知识和技能。 ### 回答2: coursera-ml-andrewng-notes-master.zip 是一个 Coursera Machine Learning 课程的笔记和教材的压缩包,由学生或者讲师编写。这个压缩包中包括了 Andrew Ng 教授在 Coursera 上发布的 Machine Learning 课程的全部讲义、练习题和答案等相关学习材料。 Machine Learning 课程是一个介绍机器学习的课程,它包括了许多重要的机器学习算法和理论,例如线性回归、神经网络、决策树、支持向量机等。这个课程的目标是让学生了解机器学习的方法,学习如何使用机器学习来解决实际问题,并最终构建自己的机器学习系统。 这个压缩包中包含的所有学习材料都是免费的,每个人都可以从 Coursera 的网站上免费获取。通过学习这个课程,你将学习到机器学习的基础知识和核心算法,掌握机器学习的实际应用技巧,以及学会如何处理不同种类的数据和问题。 总之,coursera-ml-andrewng-notes-master.zip 是一个非常有用的学习资源,它可以帮助人们更好地学习、理解和掌握机器学习的知识和技能。无论你是机器学习初学者还是资深的机器学习专家,它都将是一个重要的参考工具。 ### 回答3: coursera-ml-andrewng-notes-master.zip是一份具有高价值的文件,其中包含了Andrew NgCoursera上开授的机器学习课程的笔记。这份课程笔记可以帮助学习者更好地理解掌握机器学习技术和方法,提高在机器学习领域的实践能力。通过这份文件,学习者可以学习到机器学习的算法、原理和应用,其中包括线性回归、逻辑回归、神经网络、支持向量机、聚类、降维等多个内容。同时,这份笔记还提供了很多代码实现和模板,学习者可以通过这些实例来理解、运用和进一步深入研究机器学习技术。 总的来说,coursera-ml-andrewng-notes-master.zip对于想要深入学习和掌握机器学习技术和方法的学习者来说是一份不可多得的资料,对于企业中从事机器学习相关工作的从业人员来说也是进行技能提升或者知识更新的重要资料。因此,对于机器学习领域的学习者和从业人员来说,学习并掌握coursera-ml-andrewng-notes-master.zip所提供的知识和技能是非常有价值的。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值