Learning rate
最优值从1e-4到1e-1的数量级都碰到过,原则大概是越简单的模型的learning rate可以越大一些。
[https://blog.csdn.net/weixin_44070747/article/details/94339089]
其它:
增大batchsize来保持学习率的策略
[抛弃Learning Rate Decay吧 https://www.sohu.com/a/218600766_114877]
learning rate adaptation
bold driver algorithm \textcolor{orange}{\text{bold driver algorithm}} bold driver algorithm: after each epoch, compare the network’s loss L(t) to its previous