1. SGD
2.
SGDM
3. Adagrad
4.
RMSProp
5.
Adam
6. SWATS
7. AMSGrad
8. AdaBound
9. Cyclical LR
10. SGDR
11. One-cycle LR
12.
RAdam
13. Lookahead
14.
NAG
15.
Nadam
16.
AdamW & SGDW with momentum
17. 其他
17.1
Shuffling
17.2
Dropout
17.3
Gradient noise
17.4
Warm-up
17.5
Curriculum learning
17.6
Fine-tuning
17.7 Normalization
17.8
Regularization
18. 总结