学习笔记|Pytorch使用教程18
本学习笔记主要摘自“深度之眼”,做一个总结,方便查阅。
使用Pytorch版本为1.2
- 为什么要调整学习率?
- pytorch的六种学习率调整策略
- 学习率调整小结
一.为什么要调整学习率?
梯度下降:
W
i
+
1
=
W
i
−
g
(
W
i
)
W_{i+1}=W_{i}-g(W_{i})
Wi+1=Wi−g(Wi)
W
i
+
1
=
W
i
−
L
R
∗
g
(
W
i
)
W_{i+1}=W_{i}-LR*g(W_{i})
Wi+1=Wi−LR∗g(Wi)
学习率(learning rate)控制更新的步伐
1.class_LRScheduler
主要属性:
- optimizer :关联的优化器
- last_epoch :记录epoch数
- base_Irs:记录初始学习率
主要方法:
- step() :更新下一个epoch的学习率
- get_Ir() :虚函数,计算下一个epoch的学习率
完整代码见:学习笔记|Pytorch使用教程05(Dataloader与Dataset)
在下处进行debug:scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)
,并进入(step into)
发现进入的是class StepLR(_LRScheduler)
这个类。
进入(step into):super(StepLR, self).__init__(optimizer, last_epoch)
观察初始化过程。
跳出,进入(step into):scheduler.step() # 更新学习率
二.pytorch的六种学习率调整策略
1.StepLR
功能:等间隔调整学习率
主要参数:
- step_size :调整间隔数
- gamma :调整系数
调整方式: I r = I r ∗ g a m m a Ir=Ir*gamma Ir=Ir∗gamma
测试代码:
import torch
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
torch.manual_seed(1)
LR = 0.1
iteration = 10
max_epoch = 200
# ------------------------------ fake data and optimizer ------------------------------
weights = torch.randn((1), requires_grad=True)
target = torch.zeros((1))
optimizer = optim.SGD([weights], lr=LR, momentum=0.9)
# ------------------------------ 1 Step LR ------------------------------
# flag = 0
flag = 1
if flag:
scheduler_lr = optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1) # 设置学习率下降策略
lr_list, epoch_list = list(), list()
for epoch in range(max_epoch):
lr_list.append(scheduler_lr.get_lr())
epoch_list.append(epoch)
for i in range(iteration):
loss = torch.pow((weights - target), 2)
loss.backward()
optimizer.step()
optimizer.zero_grad()
scheduler_lr.step()
plt.plot(epoch_list, lr_list, label="Step LR Scheduler")
plt.xlabel("Epoch")
plt.ylabel("Learning rate")
plt.legend()
plt.show()
输出:
2.MultiStepLR
功能:按给定间隔调整学习率
主要参数:
- milestones :设定调整时刻数
- gamma :调整系数
调整方式: I r = I r ∗ g a m m a Ir=Ir*gamma Ir=Ir∗gamma
测试代码:
# ------------------------------ 2 Multi Step LR ------------------------------
# flag = 0
flag = 1
if flag:
milestones = [50, 125, 160]
scheduler_lr = optim.lr_scheduler.MultiStepLR(optimizer, milestones=milestones, gamma=0.1)
lr_list, epoch_list = list(), list()
for epoch in range(max_epoch):
lr_list.append(scheduler_lr.get_lr())
epoch_list.append(epoch)
for i in range(iteration):
loss = torch.pow((weights - target), 2)
loss.backward()
optimizer.step()
optimizer.zero_grad()
scheduler_lr.step()
plt.plot(epoch_list, lr_list, label="Multi Step LR Scheduler\nmilestones:{}".format(milestones))
plt.xlabel("Epoch")
plt.ylabel("Learning rate")
plt.legend()
plt.show()
输出:
- ExponentialLR
功能:按指数衰减调整学习率
主要参数: - gamma:指数的底
- 调整方式: I r = I r ∗ g a m m a e p o c h Ir=Ir*gamma^{epoch} Ir=Ir∗gammaepoch
测试代码
# ------------------------------ 3 Exponential LR ------------------------------
# flag = 0
flag = 1
if flag:
gamma = 0.95
scheduler_lr = optim.lr_scheduler.ExponentialLR(optimizer, gamma=gamma)
lr_list, epoch_list = list(), list()
for epoch in range(max_epoch):
lr_list.append(scheduler_lr.get_lr())
epoch_list.append(epoch)
for i in range(iteration):
loss = torch.pow((weights - target), 2)
loss.backward()
optimizer.step()
optimizer.zero_grad()
scheduler_lr.step()
plt.plot(epoch_list, lr_list, label="Exponential LR Scheduler\ngamma:{}".format(gamma))
plt.xlabel("Epoch")
plt.ylabel("Learning rate")
plt.legend()
plt.show()
输出:
4. CosineAnnealingLR
功能:余弦周期调整学习率
主要参数:
- T_max:下降周期
- eta_min :学习率下限
调整方式: η t = η min + 1 2 ( η max − η min ) ( 1 + cos ( T cur T max π ) ) \eta_{t}=\eta_{\min }+\frac{1}{2}\left(\eta_{\max }-\eta_{\min }\right)\left(1+\cos \left(\frac{T_{\operatorname{cur}}}{T_{\max }} \pi\right)\right) ηt=ηmin+21(ηmax−ηmin)(1+cos(TmaxTcurπ))
# ------------------------------ 4 Cosine Annealing LR ------------------------------
# flag = 0
flag = 1
if flag:
t_max = 50
scheduler_lr = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=t_max, eta_min=0.)
lr_list, epoch_list = list(), list()
for epoch in range(max_epoch):
lr_list.append(scheduler_lr.get_lr())
epoch_list.append(epoch)
for i in range(iteration):
loss = torch.pow((weights - target), 2)
loss.backward()
optimizer.step()
optimizer.zero_grad()
scheduler_lr.step()
plt.plot(epoch_list, lr_list, label="CosineAnnealingLR Scheduler\nT_max:{}".format(t_max))
plt.xlabel("Epoch")
plt.ylabel("Learning rate")
plt.legend()
plt.show()
输出:
5.ReduceLRonPlateau
功能:监控指标,当指标不再变化则调整
主要参数:
- mode : min/max两种模式
- factor :调整系数
- patience :“耐心”,接受几次不变化
- cooldown :“冷却时间",停止监控一段时间
- verbose:是否打印日志
- min_Ir :学习率下限
- eps :学习率衰减最小值
测试代码:
# ------------------------------ 5 Reduce LR On Plateau ------------------------------
# flag = 0
flag = 1
if flag:
loss_value = 0.5
accuray = 0.9
factor = 0.1
mode = "min"
patience = 10
cooldown = 10
min_lr = 1e-4
verbose = True
scheduler_lr = optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=factor, mode=mode, patience=patience,
cooldown=cooldown, min_lr=min_lr, verbose=verbose)
for epoch in range(max_epoch):
for i in range(iteration):
# train(...)
optimizer.step()
optimizer.zero_grad()
#if epoch == 5:
#loss_value = 0.4
scheduler_lr.step(loss_value)
输出:
Epoch 11: reducing learning rate of group 0 to 1.0000e-02.
Epoch 32: reducing learning rate of group 0 to 1.0000e-03.
Epoch 53: reducing learning rate of group 0 to 1.0000e-04.
更具设置patience = 10
,知道如果loss 10个epoch没有改变则会进行调整。那么下一次调整学习率为什么是32?32 - 11 = 21.这是因为设置cooldown = 10
,会等待10个epoch。
设置
if epoch == 5:
loss_value = 0.4
输出:
Epoch 16: reducing learning rate of group 0 to 1.0000e-02.
Epoch 37: reducing learning rate of group 0 to 1.0000e-03.
Epoch 58: reducing learning rate of group 0 to 1.0000e-04.
6.LambdaLR
功能:自定义调整策略
主要参数:
- Ir_lambda : function or list
测试代码:
# ------------------------------ 6 lambda ------------------------------
# flag = 0
flag = 1
if flag:
lr_init = 0.1
weights_1 = torch.randn((6, 3, 5, 5))
weights_2 = torch.ones((5, 5))
optimizer = optim.SGD([
{'params': [weights_1]},
{'params': [weights_2]}], lr=lr_init)
lambda1 = lambda epoch: 0.1 ** (epoch // 20)
lambda2 = lambda epoch: 0.95 ** epoch
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])
lr_list, epoch_list = list(), list()
for epoch in range(max_epoch):
for i in range(iteration):
# train(...)
optimizer.step()
optimizer.zero_grad()
scheduler.step()
lr_list.append(scheduler.get_lr())
epoch_list.append(epoch)
print('epoch:{:5d}, lr:{}'.format(epoch, scheduler.get_lr()))
plt.plot(epoch_list, [i[0] for i in lr_list], label="lambda 1")
plt.plot(epoch_list, [i[1] for i in lr_list], label="lambda 2")
plt.xlabel("Epoch")
plt.ylabel("Learning Rate")
plt.title("LambdaLR")
plt.legend()
plt.show()
输出:
epoch: 0, lr:[0.1, 0.095]
epoch: 1, lr:[0.1, 0.09025]
epoch: 2, lr:[0.1, 0.0857375]
epoch: 3, lr:[0.1, 0.081450625]
......
epoch: 193, lr:[1.0000000000000006e-10, 4.768474077305593e-06]
epoch: 194, lr:[1.0000000000000006e-10, 4.5300503734403135e-06]
epoch: 195, lr:[1.0000000000000006e-10, 4.3035478547682975e-06]
epoch: 196, lr:[1.0000000000000006e-10, 4.088370462029883e-06]
epoch: 197, lr:[1.0000000000000006e-10, 3.883951938928388e-06]
epoch: 198, lr:[1.0000000000000006e-10, 3.6897543419819688e-06]
epoch: 199, lr:[1.0000000000000006e-11, 3.5052666248828703e-06]
三.学习率调整小结
- 有序调整: Step、MultiStep、 Exponential 和CosineAnnealing
- 自适应调整: ReduceLROnPleateau
- 自定义调整: Lambda
学习率初始化:
- 设置较小数: 0.01、0.001、 0.0001
- 搜索最大学习率:《cyclical Learning Rates for Training Neural Networks》