目录
So, why deep? why not shallow?
感谢B站up主搬运的课程:
【李宏毅2020机器学习深度学习(完整版)国语】 https://www.bilibili.com/video/BV1JE411g7XF/?share_source=copy_web&vd_source=262e561fe1b31fc2fea4d09d310b466d
书接上回
李宏毅2020机器学习 【学习笔记】 P12Brief Intro of DP__bh的博客-CSDN博客
We talked about the universal theorem in the last blog.
The theorem states that one hidden layer is enough to realize a function which receive a N dimensions vector and output a M dimensions vector as you place enough neuron in this layer.
So, why deep? why not shallow?
Experiment result
This grid demostrate the performance of the two forms of the NN above.
The first & second columns are about the "Deep" NN;
The third & forth columns are about the "Shallow" NN.
* we put the deep one and the shallow one in the same row since their model parameters have the same order of manginitude*
The conclusion is that the shallow one even perfomed worse than the deep one.
Deep? Modularization 模组化!
We wouldn't put everything in our main function, we would define a lot of subfunctions, subfunctions of subfuctions...
These subfunctions are like modules, we could call them whenever we need, instead of put everything together which would cost many space since when you require its function you should write the program again.
on DP
If we have a task classifing the long / short hair girls / boys.
Obviously, we wolud have less data about "Boys with Long Hair", so that this part will be weak after training.
How to modularize?
Each classifier share the basic classifier ( use the output ) on the last layer as module.
Now the weak one would have better performabce.
Each layer would use the last layer as module...
The modularization is automatically learned by machine from data. 如何模组化是由机器自动学习的。
Need less training data since modularization.
Modularization: Speech
Analogy 类比
Look back to the Universality Theorem, shallow network can represent any function actually ( but it should be so "fat" ). However, using deep structure is more effective.
Like logic circuits, two layers can represent any Boolean function ( 布尔代数式 ), but not effctively.
Multi-layer would build more simple function.
More Analogy
我的理解:多层的NN,逐层提取,前面的层先提取出简单的特征,后面的层再利用其学习出更加复杂的特征,就像剪窗花一样。
Can be used in...
End-to-end Learning
Produntion line 生产线
Only offer inpurt & output
each simple function should do is learned automatically.
Complex Task
Very similar input, different output.
Very different input, similar output.
Speaker normalization is automatically done by DNN in speech recognition.