1. 固定不变
base_lr = 0.01
lr_policy = "fixed"
2. step方法
每迭代step_size次后减少gamma倍。 l r = l r × g a m m a lr=lr×gamma lr=lr×gamma
base_lr = 0.01
lr_policy = "step"
gamma = 0.1
stepsize= 10000
3. 多项式曲线下降式
L R ( t ) = b a s e _ l r × ( t T ) p o w e r LR(t)=base\_lr\times(\frac{t}{T})^{power} LR(t)=base_lr×(Tt)power
base_lr = 0.01
lr_policy = "step"
gamma = 0.1
stepsize= 10000
4. Inv
learning rate随迭代次数增加而下降。 L R ( t ) = b a s e _ l r × ( 1 + g a m m a × i t e r ) p o w e r LR(t)=base\_lr\times(1+gamma\times iter)^{power} LR(t)=base_lr×(1+gamma×iter)power
5. pytorch自定义学习率调整策略
5.1. 使用lambda函数实现
nit_lr = 5e-4
max_iter = 15000
optimizer = SGD(params=net.parameters(), lr=init_lr, momentum=0.9, weight_decay=0.0005)
lambda_func = lambda step: (1-step/max_iter)**0.9
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda_func)
for epoch in range(50):
for i in range(1000):
...
optimizer.step()
scheduler.step()
注意这里使用lambda函数时只返回init_lr需要乘的系数即可。
5.2. 使用自定义类实现
需要继承torch.optim.lr_scheduler._LRScheduler
类:
from torch.optim.lr_scheduler import _LRScheduler
class PolyLRDecay(_LRScheduler):
def __init__(self, optimizer, max_decay_steps, end_learning_rate=0.0001, power=1)
需要实现两个方法:
def get_lr(self):
def step(self, step=None):
- get_lr: 获得新的learning_rate数值
- step: 执行一次更新
例子见: poly_lr_decay
5.3. 实现poly_lr_decay
l r = l r × ( 1 − i t e r m a x _ i t e r ) p o w e r lr=lr\times (1-\frac{iter}{max\_iter})^{power} lr=lr×(1−max_iteriter)power
init_lr = 5e-4
max_iter = 15000
optimizer = SGD(params=net.parameters(), lr=init_lr, momentum=0.9, weight_decay=0.0005)
lambda_func = lambda step: (1-step/max_iter)**0.9
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda_func)
lrs = []
for epoch in range(15):
for i in range(1000):
...
optimizer.step()
lrs.append(optimizer.param_groups[0]["lr"])
scheduler.step()
plt.plot(range(15000), lrs)
plt.show()
5.4. 其他常见的learn_rate scheduler
在torch.optim.lr_scheduler
模块中都有相关定义。常见的学习率变化函数及其曲线见: kaggle_pytorch_lr_scheduler