Lecture 2_Extra Optimization for Deep Learning
文章目录
- New Optimizers for Deep Learning
- What you have known before?
- Some Notations
- What is Optimization about?
- On-line vs Off-line
- Optimizers: Real Application
- Adam vs SGDM
- Towards Improving Adam
- Towards Improving SGDM
- Does Adam need warm-up?
- k k k step forward, 1 1 1 step back
- More than momentum
- Adam in the future
- Do you really know your optimizer?
- Something helps optimization
- Summary
New Optimizers for Deep Learning
What you have known before?
- SGD
- SGD with momentum (SGDM)
- Adagrad
- RMSProp
- Adam
Some Notations
What is Optimization about?
On-line vs Off-line
Optimizers: Real Application
Adam vs SGDM
Towards Improving Adam
Simply combine Adam with SGDM?
Trouble shooting
怎么样让 Adam 收敛得又快又好?
AMSGrad [Reddi, et al., ICLR’18]
AMSGrad only handles large learning rates