2021李宏毅《机器学习/深度学习》 5. 神经网络训练不起来

最新推荐文章于 2024-07-24 21:01:17 发布

書辭

最新推荐文章于 2024-07-24 21:01:17 发布

阅读量1.1k

点赞数

分类专栏： 2021李宏毅《机器学习/深度学习文章标签：深度学习机器学习

本文链接：https://blog.csdn.net/weixin_44023070/article/details/123410188

版权

2021李宏毅《机器学习/深度学习专栏收录该内容

2 篇文章 0 订阅

订阅专栏

文章目录

前言
1. Local Minima and Saddle Point
- 判断是局部最小值还是鞍点
2. Batch and Momentum
3. Learning rate
- 3.1 lr cannot be one-size-fits-all
- 3.2 Learning Rate Scheduling
4. Possible impact of Loss
5. Batch Normalization

前言

笔记，自用

1. Local Minima and Saddle Point

在这里插入图片描述

判断是局部最小值还是鞍点

2. Batch and Momentum

2.1 Why use batch

2.1.1 Small batch vs Large batch

(1) Sometimes large batch size does not require longer time to compute gradient. Because of Parallel processing of GPUs.(Unless batch size is too large)

(2) Sometimes smaller batch requires longer time for one epoch(longer time for seeing all data once).As shown below.

(2)

(3) Sometimes the opposite:

在这里插入图片描述
ONE OF THE RESONS:
Full batch 的loss 走到一个local minima或saddle point就停下来了，没办法再更新参数。Small batch的batch都不同，第一个走不动了下一个batch也许就能接着走。

2.1.2 Sometimes small batch is better on testing data

在这里插入图片描述

ONE OF THE RESONS:
Sharp Minima很可能会困住large batch，但不会困住small batch(有人信有人不信)。

2.1.3 A short summary

在这里插入图片描述

2.2 What is momentum

在这里插入图片描述

2.3 Short summary

在这里插入图片描述

3. Learning rate

3.1 lr cannot be one-size-fits-all

在这里插入图片描述
某个方向上gradient很小，lr调大；gradient很小，很陡峭，lr调小。

(1) Root Mean Square
(2) RMSProp
(3) Adam: RMSProp + Momentum

3.2 Learning Rate Scheduling

(1) Learning Rate Decay
在这里插入图片描述
(2) Warm up

4. Possible impact of Loss

在这里插入图片描述
A rough understanding of softmax:
Making y(have any value) between 0 and 1, for label y can be 0 or 1.

在这里插入图片描述

Why Cross-entromy?

在这里插入图片描述

5. Batch Normalization

5.1 Why batch normalization

One of feature normalization:
在这里插入图片描述

在这里插入图片描述
在z或a处做normaliaztion并无太大差别。使用sigmiod时建议在z。As below:

在这里插入图片描述

当x经过feature normalization后，z1z2z3与后续z1等a1等都相关联，z1改变后面的参数都会变化，所以要同时考虑后面所有的参数。又由于data量大，不能考虑完整的，所以考虑一个batch的normalization。

在这里插入图片描述

5.2 Testing problem

在这里插入图片描述

5.3 Other normalization

在这里插入图片描述

A short summary

Batch normalization change the landscape of error surface.

書辭

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
2021李宏毅《机器学习/深度学习》 5. 神经网络训练不起来

笔记
复制链接

扫一扫

专栏目录