Pytorch 学习率规划器

最新推荐文章于 2025-03-27 16:34:24 发布

云中君不见

最新推荐文章于 2025-03-27 16:34:24 发布

阅读量819

点赞数

文章标签： pytorch 深度学习

本文链接：https://blog.csdn.net/cendrier/article/details/129128877

版权

learning rate scheduler，在训练过程中根据不同的策略动态调整学习率，让学习率成为 epoch 的函数。

下面介绍 Pytorch 中常用的 scheduler 及用法。

ExponentialLR

$lr_{\text{epoch}} = Gamma ∗ lr_{\text{epoch} - 1}$

下一个 epoch 的学习率是当前 epoch 学习率的 Gamma 倍。

scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.1)

假设最初学习率为 100：
在这里插入图片描述

StepLR

与 ExponentialLR 类似，只不过 ExponentialLR 的缩小周期是一个 epoch，而 StepLR 的缩小周期由我们通过参数 step_size 指定。
在这里插入图片描述

scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=2, gamma=0.1)

在这里插入图片描述

MultiStepLR

每当 epoch 到达一个里程碑 (milestone) 时，学习率变成原来的 Gamma 倍
在这里插入图片描述

scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[6,8,9], gamma=0.1)

在这里插入图片描述

CosineAnnealingLR

学习率余弦性地周期变动：
在这里插入图片描述
参数解析：

T_max：学习率变动周期的一半
eta_min：最小学习率，默认为 0

另外，公式中的 eta_max 等于初始学习率。

scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10, eta_min=0)

在这里插入图片描述

OneCycleLR

OneCycleLR 策略由 Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates 这篇论文提出。该策略被证明可以加速收敛。

顾名思义，OneCycleLR 只会进行一个周期的学习率变动。由最初的学习率增长到指定的最大学习率 max_lr，然后回落到一个比最初学习率更小的学习率。

torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr, total_steps=None, epochs=None, \
					steps_per_epoch=None, pct_start=0.3, anneal_strategy='cos')

参数解析：

max_lr：学习率的最大值
total_steps：总的更新步数，它应该等于 epochs * steps_per_epoch。如果该参数被指定，无需再指定 epochs 和 steps_per_epoch
pct_start：周期内学习率上升的步数占总步数的比例
anneal_strategy：变动策略，可选 'cos'（默认）和 'linear'

注意：上面介绍的策略，学习率是 epoch 的函数。而 OneCycleLR 策略，学习率是更新步数的函数，相当于每过一个 batch，学习率就会变动。

The 1cycle learning rate policy changes the learning rate after every batch.
.step() should be called after a batch has been used for training.

余弦变动策略：

scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.1, steps_per_epoch=10, \
												epochs=10, anneal_strategy='cos')

在这里插入图片描述
线性变动策略：

scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.1, steps_per_epoch=10, \
												epochs=10,anneal_strategy='linear')

在这里插入图片描述

ReduceLROnPlateau

以上介绍的策略都是基于 epoch 或者更新步数的。而 ReduceLROnPlateau 根据某一项 metric 的变动情况决定是否改变学习率。

torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, \
											threshold=1e-4)

具体来说，在等待 patience 个 epoch 后，如果 metric 没有明显改进（是否明显由 threshold 决定），那么调低学习率——乘以一个小于 1 的系数 factor.

参数 mode 决定了我们希望 metric 变动的方向。mode='min' 表示我们希望 metric 降低，当它停止下降时，降低学习率；mode='max' 表示我们希望 metric 增大，当它停止上升时，降低学习率。

组合策略

利用 ChainedScheduler 进行策略的组合：提供策略的列表；在每次学习率变动时，依次调用列表中的策略。—— ChainedScheduler

利用 SequentialLR 在不同的 epoch 调用不同的策略：提供策略的列表，以及相应的 epoch 里程碑；当 epoch 达到里程碑时，自动选取下一个策略。—— SequentialLR

如何使用？

还有一个重要的问题：何时调用 scheduler.step()？

Prior to PyTorch 1.1.0, the learning rate scheduler was expected to be called before the optimizer’s update; 1.1.0 changed this behavior in a BC-breaking way. If you use the learning rate scheduler (calling scheduler.step()) before the optimizer’s update (calling optimizer.step()), this will skip the first value of the learning rate schedule. If you are unable to reproduce results after upgrading to PyTorch 1.1.0, please check if you are calling scheduler.step() at the wrong time.

Pytorch 官方文档的这段话的意思是，PyTorch 1.1.0 版本后：应该先调用 optimizer.step()，再调用 scheduler.step()

那么在每个 epoch 结束时调用 scheduler.step()，还是在每个 batch 结束时调用呢？
——答案是不一定。

对于大部分 scheduler 来说，是在每个 epoch 结束时调用 scheduler.step()：

for epoch in range(num_epoch):
  for img, labels in train_loader:
    .....
    optimizer.zero_grad()
    optimizer.step()
    
  # At the end of the epoch
  scheduler.step()

但对于 OneCycleLR scheduler 来说，应该在每个batch 结束时调用：

for epoch in range(num_epoch):
  for img, labels in train_loader:
    .....
    optimizer.zero_grad()
    optimizer.step()
  	scheduler.step() # call it every batch, after 'optimizer.step()'