李宏毅2020机器学习 【学习笔记】 P15Why Deep?

目录

So, why deep? why not shallow?

Experiment result

Deep? Modularization 模组化!

on DP

How to modularize?

Modularization: Speech

Analogy 类比

More Analogy

Can be used in...

End-to-end Learning

Produntion line 生产线

Complex Task


感谢B站up主搬运的课程:

【李宏毅2020机器学习深度学习(完整版)国语】 https://www.bilibili.com/video/BV1JE411g7XF/?share_source=copy_web&vd_source=262e561fe1b31fc2fea4d09d310b466d


书接上回
李宏毅2020机器学习 【学习笔记】 P12Brief Intro of DP__bh的博客-CSDN博客

We talked about the universal theorem in the last blog.

The theorem states that one hidden layer is enough to realize a function which receive a N dimensions vector and output a M dimensions vector as you place enough neuron in this layer.

So, why deep? why not shallow?

Experiment result

This grid demostrate the performance of the two forms of the NN above. 

The first & second columns are about the "Deep" NN;

The third & forth columns are about the "Shallow" NN.

* we put the 5\times2K deep one and the 1\times 3772 shallow one in the same row since their model parameters have the same order of manginitude*

The conclusion is that the 1\times 16K shallow one even perfomed worse than the 2\times2K deep one.

Deep? Modularization 模组化!

We wouldn't put everything in our main function, we would define a lot of subfunctions, subfunctions of subfuctions... 

These subfunctions are like modules, we could call them whenever we need, instead of put everything together which would cost many space since when you require its function you should write the program again.

on DP

If we have a task classifing the long / short hair girls / boys.

Obviously, we wolud have less data about "Boys with Long Hair", so that this part will be weak after training.

How to modularize?

Each classifier share the basic classifier ( use the output ) on the last layer as module.

Now the weak one would have better performabce.

Each layer would use the last layer as module...

The modularization is automatically learned by machine from data. 如何模组化是由机器自动学习的。

Need less training data since modularization.

Modularization: Speech

Analogy 类比

Look back to the Universality Theorem, shallow network can represent any function actually ( but it should be so "fat" ). However, using deep structure is more effective.

Like logic circuits, two layers can represent any Boolean function ( 布尔代数式 ), but not effctively.

Multi-layer would build more simple function.

More Analogy

我的理解:多层的NN,逐层提取,前面的层先提取出简单的特征,后面的层再利用其学习出更加复杂的特征,就像剪窗花一样。

Can be used in...

End-to-end Learning

Produntion line 生产线

Only offer inpurt & output

each simple function should do is learned automatically.

Complex Task

Very similar input, different output.

Very different input, similar output.

Speaker normalization is automatically done by DNN in speech recognition.

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值