Dl4j学习率衰减策略
package org.deeplearning4j.nn.conf;
/**
* Learning Rate Policy
*
* How to decay learning rate during training.
*
* <p><b>None</b> = do not apply decay policy aka fixed in Caffe <br>
* <p><b>Exponential</b> = applies decay rate to the power of the # batches <br>
* <p><b>Inverse</b> = divide learning rate by negative (1 + decay rate * # batches)^power <br>
* <p><b>Poly</b> = polynomial decay that hits 0 when iterations are complete <br>
* <p><b>Sigmoid</b> = sigmoid decay rate <br>
* <p><b>Step</b> = decay rate to the power of the floor (nearest integer) of # of batches by # of steps <br>
* <p><b>Schedule</b> = rate to use at a specific iteration <br>
* <p><b>Score</b> = apply decay when score stops improving <br>
*/
// TODO provide options using epochs in addition to iterations
public enum LearningRatePolicy {
None, Exponential, Inverse, Poly, Sigmoid, Step, TorchStep, Schedule, Score
}
dl4j的学习率衰减策略应用部分是在反向传播计算完地图之后,调用Updater.update()方法对梯度进行更新并且进行梯度的衰减。
调用学习率衰减的为package org.deeplearning4j.optimize.solvers
包下的BaseOptimizer
抽象类中的updateGradientAccordingToParams(Gradient gradient, Model model, int batchSize)
方法中的updater.update(layer, gradient, getIterationCount(model), batchSize);
语句。
- 后面衰减策略所使用的iteration为Model的总迭代次数
- 衰减率由
NeuralNetConfiguration.Builder()
的lrPolicyDecayRate()
方法进行配置,衰减率通常为0~1中间的一个值。
Exponential
newLr = lr * Math.pow(decayRate, iteration);
newLr=lr×decayRateiteration
Inverse
newLr = lr / Math.pow((1 + decayRate * iteration), conf.getLrPolicyPower());
newLr=lr1+(decayRate×itera