「解析」CosineLRScheduler 调度器

本文介绍了深度学习训练中用于动态调整学习率的CosineLRScheduler,它是基于SGDR(Stochastic Gradient Descent with Warm Restarts)的一种策略。CosineLRScheduler通过周期性的学习率衰减和热身阶段来优化模型训练。文章详细解释了CosineAnnealingLR和CosineAnnealingWarmRestarts的区别,并提供了使用示例和关键参数的说明。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

在深度学习的训练过程中,需要配置一些超参数,但是在配置的过程中,往往需要根据经验来设置,这对缺乏经验的小白十分不友好,因此就有了动态调整学习率的算法,统称为 LRScheduler 学习率调度器。在此推荐一种十分受欢迎的 调度器 即:CosineLRScheduler。

⚠️注意:在论文中,这个调度器被称为SGDR,但在实际使用中,它常常被称为cosine调度器。两者大题一致,实现差异很小。

论文中提到的 SGDR 训练图如下:

在这里插入图片描述

1、CosineLRScheduler

from timm.scheduler.cosine_lr import CosineLRScheduler


CosineLRScheduler(	optimizer:Optimizer,  t_initial:int, t_mul:float=1.0, lr_min:float, 
					decay_rate:float=1.0, warmup_t, warmup_lr_init,warmup_prefix=False, 
					cycle_limit, t_in_epochs=True, noise_range_t=None, noise_pct, 
					noise_std=1.0, noise_seed=42, initialize=True) ::Scheduler

CosineLRScheduler 接受 optimizer 和一些超参数。我们将首先看看如何首先使用timm训练文档来使用cosineLR调度器训练模型,然后看看如何将此调度器用作自定义训练脚本的独立调度器。

将cosine调度器与timm训练脚本一起使用
要使用余cosine调度器训练模型,我们只需更新通过传递–sched cosine参数和必要的超参数传递的训练脚本args。在本节中,我们还将了解每个超参数如何更新余cosine调度器。

t_initial

The initial number of epochs。例如,50、100等

t_mul

Defaults to 1.0. Updates the SGDR schedule annealing.

lr_min 最小学习率

默认为1e-5。训练期间要使用的最低学习率。学习率永远不会低于这个值。

decay_rate:衰减比例

When decay_rate > 0 and <1., at every restart the learning rate is decayed by new learning rate which equals l r ∗ d e c a y _ r a t e lr * decay\_rate lrdecay_rate . So if decay_rate=0.5, then in that case, the new learning rate becomes half the initial lr.

在这里插入图片描述

warmup_t 定义热身时代的数量

warmup_lr_init 热身期间的初始学习率

warmup_prefix

默认为False。如果设置为True,那么每个新纪元数都等于epoch = epoch - warmup_t。

cycle_limit

SGDR 中的最大重启次数
The number of maximum restarts in SGDR.

t_in_epochs

If set to False, the learning rates returned for epoch t are None.

initialize 初始化

If set to True, then, the an attributes initial_lr is set to each param group. Defaults to True.



2、CosineAnnealingLR

⚠️Note:
that this only implements the cosine annealing part of SGDR, and not the restarts.
The full version : CosineAnnealingWarmRestarts

Set the learning rate of each parameter group using a cosine annealing schedule, where η m a x η_{max} ηmax is set to the initial lr and T c u r T_{cur} Tcur is the number of epochs since the last restart in SGDR:

η t = η m i n + 1 2 ( η m a x − η m i n ) ( 1 + c o s ( T c u r T m a x π ) ) , T c u r ≠ ( 2 k + 1 ) T m a x ; η t + 1 = η t + 1 2 ( η m a x − η m i n ) ( 1 − c o s ( 1 T m a x π ) ) , T c u r = ( 2 k + 1 ) T m a x ; \eta_t = \eta_{min} + \frac{1}{2}(\eta_{max}-\eta_{min})\Big( 1+cos\big(\frac{T_{cur}}{T_{max}}\pi\big) \Big) , \qquad T_{cur}\neq(2k+1)T_{max}; \\ \quad \\ \eta_{t+1} = \eta_{t} + \frac{1}{2}(\eta_{max}-\eta_{min})\Big( 1-cos\big(\frac{1}{T_{max}}\pi\big) \Big) ,\qquad T_{cur}=(2k+1)T_{max}; ηt=ηmin+21(ηmaxηmin)(1+cos(TmaxTcurπ)),Tcur=(2k+1)Tmax;ηt+1=ηt+21(ηmaxηmin)(1cos(Tmax1π)),Tcur=(2k+1)Tmax;

When last_epoch=-1, sets initial lr as lr. Notice that because the schedule is defined recursively, the learning rate can be simultaneously modified outside this scheduler by other operators. If the learning rate is set solely by this scheduler, the learning rate at each step becomes:

η t = η m i n + 1 2 ( η m a x − η m i n ) ( 1 + c o s ( T c u r T m a x π ) ) \eta_t=\eta_{min}+\frac{1}{2}(\eta_{max}-\eta_{min}) \Big( 1+cos\big( \frac{T_{cur}}{T_{max}}\pi \big) \Big) ηt=ηmin+21(ηmaxηmin)(1+cos(TmaxTcurπ))

from torch.optim import lr_scheduler


lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=- 1, verbose=False)



Parameters:
  optimizer (Optimizer) – Wrapped optimizer.
  T_max (int) 		– Maximum number of iterations.
  eta_min (float) 	– Minimum learning rate. Default: 0.
  last_epoch (int) 	– The index of last epoch. Default: -1.
  verbose (bool) 	– If True, prints a message to stdout for each update. Default: False.
  • get_last_lr()
    Return last computed learning rate by current scheduler.
  • load_state_dict(state_dict)
    Loads the schedulers state.
    Parameters:
    state_dict (dict) – scheduler state. Should be an object returned from a call to state_dict().
  • print_lr(is_verbose, group, lr, epoch=None)
    Display the current learning rate.
  • state_dict()
    Returns the state of the scheduler as a dict.
    It contains an entry for every variable in self.__dict__ which is not the optimizer.

3、CosineAnnealingWarmRestarts

from torch.optim import lr_scheduler
lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0, T_mult=1, eta_min=0, last_epoch=- 1, verbose=False)


Parameters
  optimizer (Optimizer) – Wrapped optimizer.
  T_0 (int) – Number of iterations for the first restart.
  T_mult (int, optional) – A factor increases Ti after a restart. Default: 1.
  eta_min (float, optional) – Minimum learning rate. Default: 0.
  last_epoch (int, optional) – The index of last epoch. Default: -1.
  verbose (bool) – If True, prints a message to stdout for each update. Default: False.

Set the learning rate of each parameter group using a cosine annealing schedule, where η m a x η_{max} ηmax is set to the initial lr, T c u r T_{cur} Tcur is the number of epochs since the last restart and T i T_i Ti is the number of epochs between two warm restarts in SGDR:

η t = η m i n + 1 2 ( η m a x − η m i n ) ( 1 + c o s ( T c u r T i π ) ) \eta_t=\eta_{min}+\frac{1}{2}(\eta_{max}-\eta_{min}) \Big( 1+cos\big( \frac{T_{cur}}{T_{i}}\pi \big) \Big) ηt=ηmin+21(ηmaxηmin)(1+cos(TiTcurπ))

When T c u r = T i T_{cur}=T_i Tcur=Ti, set η t = η m i n \eta_t=\eta_{min} ηt=ηmin. When T c u r = 0 T_{cur}=0 Tcur=0 after restart, set η t = η m a x \eta_t=\eta_{max} ηt=ηmax.

  • get_last_lr()
    Return last computed learning rate by current scheduler.
  • load_state_dict(state_dict)
    Loads the schedulers state.
    Parameters:
    state_dict (dict) – scheduler state. Should be an object returned from a call to state_dict().
  • print_lr(is_verbose, group, lr, epoch=None)
    Display the current learning rate.
  • state_dict()
    Returns the state of the scheduler as a dict.
    It contains an entry for every variable in self.__dict__ which is not the optimizer.
  • step(epoch=None)
    Step could be called after every batch update

Step Example

-----------------------------------------------------------------
""" called after every batch update """
scheduler = CosineAnnealingWarmRestarts(optimizer, T_0, T_mult)
iters = len(dataloader)

for epoch in range(20):
    for i, sample in enumerate(dataloader):
        inputs, labels = sample['inputs'], sample['labels']
        
        optimizer.zero_grad()
        outputs = net(inputs)
        loss    = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        scheduler.step(epoch + i / iters)



-----------------------------------------------------------------
""" called in an interleaved way. """ 

scheduler = CosineAnnealingWarmRestarts(optimizer, T_0, T_mult)
for epoch in range(20):
    scheduler.step()
scheduler.step(26)
scheduler.step()

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

ViatorSun

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值