李宏毅2020机器学习【学习笔记】 P15Why Deep？

_bh

已于 2023-09-04 23:40:10 修改

阅读量626

点赞数 1

分类专栏：机器学习【学习笔记】文章标签：机器学习人工智能学习笔记

于 2023-09-02 23:47:46 首次发布

本文链接：https://blog.csdn.net/weixin_51330846/article/details/132625450

版权

机器学习【学习笔记】专栏收录该内容

14 篇文章 1 订阅

订阅专栏

So, why deep? why not shallow?

Experiment result

Deep? Modularization 模组化！

on DP

How to modularize?

Modularization: Speech

感谢B站up主搬运的课程：

【李宏毅2020机器学习深度学习(完整版)国语】 https://www.bilibili.com/video/BV1JE411g7XF/?share_source=copy_web&vd_source=262e561fe1b31fc2fea4d09d310b466d

书接上回
李宏毅2020机器学习【学习笔记】 P12Brief Intro of DP__bh的博客-CSDN博客

We talked about the universal theorem in the last blog.

The theorem states that one hidden layer is enough to realize a function which receive a N dimensions vector and output a M dimensions vector as you place enough neuron in this layer.

So, why deep? why not shallow?

Experiment result

This grid demostrate the performance of the two forms of the NN above.

The first & second columns are about the "Deep" NN;

The third & forth columns are about the "Shallow" NN.

* we put the $5\times2K$ deep one and the $1\times 3772$ shallow one in the same row since their model parameters have the same order of manginitude*

The conclusion is that the $1\times 16K$ shallow one even perfomed worse than the $2\times2K$ deep one.

Deep? Modularization 模组化！

We wouldn't put everything in our main function, we would define a lot of subfunctions, subfunctions of subfuctions...

These subfunctions are like modules, we could call them whenever we need, instead of put everything together which would cost many space since when you require its function you should write the program again.

on DP

If we have a task classifing the long / short hair girls / boys.

Obviously, we wolud have less data about "Boys with Long Hair", so that this part will be weak after training.

How to modularize?

Each classifier share the basic classifier ( use the output ) on the last layer as module.

Now the weak one would have better performabce.

Each layer would use the last layer as module...

The modularization is automatically learned by machine from data. 如何模组化是由机器自动学习的。

Need less training data since modularization.

Modularization: Speech

Analogy 类比

Look back to the Universality Theorem, shallow network can represent any function actually ( but it should be so "fat" ). However, using deep structure is more effective.