理论
- sgd默认为0.01,adam默认为0.001
- 加载预训练模型时,学习率变为原来的10%
- batch size变大k倍时,理论上学习率变大sqrt(k)倍,但实际上用线性规则变大k倍更好。
- 通过实验估计最优lr
代码
- pytorch-lr-finder
- fastai, pytorch-lightning 与keras都支持lr-finder
参考
- 与batch size的sqrt(k)关系
One weird trick for parallelizing convolutional neural networks 2014年理论推导 - 与batch size的线性关系
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour 2018年实践结果 - How should the learning rate change as the batch size change? stackoverflow
- Visualizing Learning rate vs Batch size