常用的学习率衰减方法
1· torch.optim.lr_scheduler.LambdaLR()
torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1)
这个和手动设置的学习率无关,和epoch有关
参数:
optimizer (Optimizer) – optimizer.
lr_lambda (function or list) – A function which computes a multiplicative factor given an integer parameter epoch, or a list of such functions, one for each group in optimizer.param_groups.
last_epoch (int) – The index of last epoch(上一个epoch的标号). Default: -1.
示例:
>>> # Assuming optimizer has two groups.
>>> lambda1 = lambda epoch: epoch // 30#此处为 匿名函数,觉得奇怪可以看下:https://blog.csdn.net/yrwang_xd/article/details/106446071
>>> lambda2 = lambda epoch: 0.95 ** epoch
>>> scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])
>>> for epoch in range(100):
>>> train(...)
>>> validate(...)
>>> scheduler.step()
这个类中的两个方法:
load_state_dict(state_dict)
Loads the schedulers state.
Parameters
state_dict (dict) – scheduler state. Should be an object returned from a call to state_dict().
state_dict()
Returns the state of the scheduler as a dict.
It contains an entry for every variable in self.__dict__ which is not the optimizer. The learning rate lambda functions will only be saved if they are callable objects and not if they are functions or lambdas.
2· torch.optim.lr_scheduler.MultiplicativeLR()
torch.optim.lr_scheduler.MultiplicativeLR(optimizer, lr_lambda, last_epoch=-1)
这个就是设置的默认学习率乘以lr_lambda
参数:
与1·一致
示例:
>>> lmbda = lambda epoch: 0.95
>>> scheduler = MultiplicativeLR(optimizer, lr_lambda=lmbda)
>>> for epoch in range(100):
>>> train(...)
>>> validate(...)
>>> scheduler.step()
下面这两个方法也和1·一样
load_state_dict(state_dict)
state_dict()
3· torch.optim.lr_scheduler.StepLR()
torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)
参数:
optimizer (Optimizer) – Wrapped optimizer.
step_size (int) – 每个学习率衰减的阶段的大小,(也就是下图中每个平台的宽度).
gamma (float) – 学习率衰减时,每次衰减后的结果占上一次的结果的比例. Default: 0.1.
last_epoch (int) – The index of last epoch. Default: -1.
示例:
>>> # Assuming optimizer uses lr = 0.05 for all groups
>>> # lr = 0.05 默认学习率 if epoch < 30
>>> # lr = 0.005 if 30 <= epoch < 60
>>> # lr = 0.0005 if 60 <= epoch < 90
>>> # ...
>>> scheduler = StepLR(optimizer, step_size=30, gamma=0.1)
>>> for epoch in range(100):
>>> train(...)
>>> validate(...)
>>> scheduler.step()
下图给了图例,默认学习率为0.01
4· torch.optim.lr_scheduler.MultiStepLR()
torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1)
参数:
optimizer (Optimizer) – Wrapped optimizer.
milestones (list) – 可以理解为由拐点时epoch的index构成的list. Must be increasing.
gamma (float) – 学习率衰减时,每次衰减后的结果占上一次的结果的比例. Default: 0.1.
last_epoch (int) – The index of last epoch. Default: -1.
示例:
>>> # Assuming optimizer uses lr = 0.05 for all groups
>>> # lr = 0.05 if epoch < 30
>>> # lr = 0.005 if 30 <= epoch < 80
>>> # lr = 0.0005 if epoch >= 80
>>> scheduler = MultiStepLR(optimizer, milestones=[30,80], gamma=0.1)
>>> for epoch in range(100):
>>> train(...)
>>> validate(...)
>>> scheduler.step()
下图给了图例,拐点设置成[5,20,25,80],默认学习率为0.01
5· torch.optim.lr_scheduler.ExponentialLR()
torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1)
参数:
optimizer (Optimizer) – Wrapped optimizer.
gamma (float) – Multiplicative factor of learning rate decay.
last_epoch (int) – The index of last epoch. Default: -1.
图例:
就是一个指数衰减的过程,每轮都将学习率乘个gamma,默认学习率为0.01
上面给出了5中常用的学习率衰减方法,后面会持续更新一些更加新的、更加复杂的方法。