pytorch LearningRate 的调整方法总结

最新推荐文章于 2024-04-16 13:23:39 发布

JL_Jessie

最新推荐文章于 2024-04-16 13:23:39 发布

阅读量4.9k

点赞数 3

分类专栏： pytorch python 文章标签：深度学习人工智能算法

本文链接：https://blog.csdn.net/m0_37531129/article/details/107794136

版权

python 同时被 2 个专栏收录

14 篇文章 1 订阅

订阅专栏

pytorch

7 篇文章 0 订阅

订阅专栏

优化器中最重要的一个参数是学习率，合理的学习率可以使优化器快速收敛。一般在训练初期设定较大的学习率，随着训练的进行，学习率逐渐减小，学习率什么时候减小，减小多少，这就涉及到学习率调整方法。

pytorch V1.60 提供了 10种 learning rate 调整方法，这里做一个简单的总结。

所有的学习率调整方法可以分3大类，分别是有序调整，自适应调整，自定义调整。

第一类：有序调整，依据一定的规律有序进行调整，这一类是最常用的，分别是等间隔下降(step), 按需设定下降间隔(MultiStep)，指数下降(Exponential)和余弦退火CosineAnnealing。这种方法的调整时间都是人为可控的，也是训练时常用到的。

第二类：自适应调整，依据训练状态伺机调整，ReduceLROnPlateau方法。该方法通过监测某一指标的变化情况，当该指标不再怎么变化的时候，就是调整学习率的时机，因而属于自适应调整。

第三类：自定义调整，Lambda. lamda方法提供的调整策略十分灵活。我们可以为不同的层设置不同的学习率调整方法，这在fine-tune中十分有用，我们不仅可以为不同层设定不同的学习率，还是设置不同的学习率调整策略。

1. LambdaLR

torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1)

为不同参数组设定不同学习率调整策略。调整规则为 lr = base_lr * lambda(self.last_epoch)

参数：

lr_lambda(function or list): 一个计算学习率调整倍数的函数，输入通常为step,当有多个参数组时，设为list.
last_epoch(int): 上一个 epoch 数，这个变量用来指示学习率是否需要调整。当 last_epoch 符合设定的间隔时，就会对学习率进行调整。当为-1 时，学习率设置为初始值。

>>> # Assuming optimizer has two groups.
>>> ignored_params = list(map(id, net.fc3.parameters()))
>>> base_params = filter(lambda p: id(p) not in ignored_params, net.parameters())
>>> optimizer = optim.SGD([{'params':base_params},{'params':net.fc3.parameters(), 'lr': 0.001*100}], 0.001, momentum=0.9, weight_decay=1e-4)
>>> lambda1 = lambda epoch: epoch // 30
>>> lambda2 = lambda epoch: 0.95 ** epoch
>>> scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])
>>> for epoch in range(100):
>>>     train(...)
>>>     validate(...)
>>>     scheduler.step()
>>>     print('epoch: ', i, 'lr: ', scheduler.get_lr())

输出：
epoch: 0 lr: [0.0, 0.1]
epoch: 1 lr: [0.0, 0.095]
epoch: 2 lr: [0.0, 0.09025]
epoch: 3 lr: [0.001, 0.0857375]
epoch: 4 lr: [0.001, 0.081450625]
epoch: 5 lr: [0.001, 0.07737809374999999]
epoch: 6 lr: [0.002, 0.07350918906249998]
epoch: 7 lr: [0.002, 0.06983372960937498]
epoch: 8 lr: [0.002, 0.06634204312890622]
epoch: 9 lr: [0.003, 0.0630249409724609]

为什么第一个参数组的学习率会是0呢？来看看学习率是如何计算的。
第一个参数的学习率设置为0.001， lambda = lambda epoch: epoch/3
第一个epoch时，由lr = base_lr * lambda(self.last_epoch) 可以知道 lr = 0.001 * (0//3) = 0
第二个参数组的学习率变化，初始为0.1, lr = 0.1 * 0.95^epoch,当epoch 为0时，lr=0.1, epoch为1时，lr=0.1*0.95.

2. MultiplicativeLR

3. StepLR

torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)

等间隔调整学习率，调整倍数为gamma倍。间隔单位是step. 需要注意的是step通常是指epoch，而不是iteration。
参数：

step_size(int)- 学习率下降间隔数，若为 30，则会在 30、60、90…个 step 时，将学习率调整为 lr*gamma。
gamma(float)- 学习率调整倍数，默认为 0.1 倍，即下降 10 倍。
last_epoch(int)- 上一个 epoch 数，这个变量用来指示学习率是否需要调整。当last_epoch 符合设定的间隔时，就会对学习率进行调整。当为-1 时，学习率设置为初始值。

>>> # Assuming optimizer uses lr = 0.05 for all groups
>>> # lr = 0.05     if epoch < 30
>>> # lr = 0.005    if 30 <= epoch < 60
>>> # lr = 0.0005   if 60 <= epoch < 90
>>> # ...
>>> scheduler = StepLR(optimizer, step_size=30, gamma=0.1)
>>> for epoch in range(100):
>>>     train(...)
>>>     validate(...)
>>>     scheduler.step()

4. MultiStepLR

torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1)

按照设定的间隔调整学习率。这个方法适合后期调试使用，观察loss曲线，为每个实验定制学习率调整时机。

参数：

milestones (list):是一个list,每个元素代表何时调整学习率，list元素必须是递增的，比如：milestones=[30,40,80,120]
gamma(float):学习率调整倍数，默认为0.1倍，即下降10倍
last_epoch(int): 上一个epoch数，这个变量用来指示学习率是否需要调整，当last_epcoh符合设定的间隔时，就会对学习率进行调整。当为-1时，学习率设置为初始值。

>>> # Assuming optimizer uses lr = 0.05 for all groups
>>> # lr = 0.05     if epoch < 30
>>> # lr = 0.005    if 30 <= epoch < 80
>>> # lr = 0.0005   if epoch >= 80
>>> scheduler = MultiStepLR(optimizer, milestones=[30,80], gamma=0.1)
>>> for epoch in range(100):
>>>     train(...)
>>>     validate(...)
>>>     scheduler.step()

5. ExponentialLR

torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1)

按指数衰减调整学习率。lr = lr * (gamma **epoch)

参数：

gamma: 学习率调整倍数的底，指数为epoch. 即 gamma ** epoch
last_epoch(int): 上一个epoch数，这个变量用来指示学习率是否需要调整，当last_epcoh符合设定的间隔时，就会对学习率进行调整。当为-1时，学习率设置为初始值。

6. CosineAnnealingLR

torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1)

余弦退火方式调整学习率。以余弦为周期，并在每个周期最大值时重新设置学习率。
学习率调整公式：
$\eta_t = \eta_{min} + \frac {1}{2} (\eta_{max} - \eta_{min})(1 + cos(\frac {T_{cur}}{T_{max}}\pi))$
可以看出，余弦退火调整方式，是以初始学习率为最大学习率，以2*Tmax为周期，在一个周期内先下降后上升地调整学习率。

参数：

T_max(int): 一次学习率周期的迭代次数，即T_max个epoch之后重新设置学习率
eta_min(float): 最小学习率，即在一个周期中，学习率最小会降到eta_min，默认值为0

7.ReduceLROnPlateau

torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)

这个是自适应学习率调整。非常实用的调整策略。
当某个指标不再变化（loss不再下降或者acc不再升高），调整学习率。

参数：

mode(str): 模式选择，有 min和max两种模式，min表示当指标不再降低（如loss）,max表示当指标不再上升（如acc）
factor(float): 学习率调整倍数，等同于其他方法中的gamma, 即学习率更新为lr = lr * factor
patience(int): 直译 ‘耐心’，即忍受该指标多少个step不变化，当忍无可忍时，调整学习率。比如patinence=2，意味着忽略前2个epoch，从第3个epoch开始调整学习率。
verbose(bool): 是否打印学习率信息，默认是false
threshold(float): threshold for measuring the new optimum, 配合threshold_mode使用。
threshold_mode(str): 选择判断指标是否达到最优模式，有两种模式 rel 和abs.
当threshold_moderel,并且mode=max时， dynamic_threshold = best*(1+threshold).
当threshold_mode == rel，并且mode=min时，dynamic_threshold = best*(1-threshold).
当threshold_modeabs，并且mode=max时，dynamic_threshold=best + threshold
当threshold_moderel，并且modemin时，dynamic_threshold=best - threshold
cooldown(int): “冷却时间”，当调整学习率之后，让学习率调整策略冷静一下，让模型再训练一段时间，再重启检测模式
min_lr(float or list): 学习率下限，可以是float 或者list, 当有多个参数组时，可用list进行设置
eps(float):学习率衰减的最小值，当学习率变化小于eps时，则不调整学习率

>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
>>> scheduler = ReduceLROnPlateau(optimizer, 'min')
>>> for epoch in range(10):
>>>     train(...)
>>>     val_loss = validate(...)
>>>     # Note that step should be called after validate()
>>>     scheduler.step(val_loss)

8. CyclicLR

torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr, max_lr, step_size_up=2000, step_size_down=None, mode='triangular', gamma=1.0, scale_fn=None, scale_mode='cycle', cycle_momentum=True, base_momentum=0.8, max_momentum=0.9, last_epoch=-1)

这个是来自于 Cyclical Learning rates for training Neural Networks 这篇文章。

>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
>>> scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=0.01, max_lr=0.1)
>>> data_loader = torch.utils.data.DataLoader(...)
>>> for epoch in range(10):
>>>     for batch in data_loader:
>>>         train_batch(...)
>>>         scheduler.step()

9. OneCycleLR

10. CosineAnnealingWarmRestarts

注意：pytorch中，学习率更新时 scheduler.step()，新版的pytorch 已经不依赖于epoch了，如果你在epoch的for循环中调用 step()，与之前的方法一样 epoch 也会加一。如果你在epoch 里面的 iteration 的for循环中调用 lr_schedular.step() 那么会调用get_closed_form_lr() 或者就直接get_lr() 这里epoch 参数已经不是必须的了。

在源码中 torch/optim/lr_scheduler.py ， step()的方法在LRScheduler类当中，该类作为所有学习率调整的基类，其中定义了一些基本方法，如step(),以及最常用的get_lr()，不过get_lr是一个虚函数，均需要在派生类中重新定义。
具体可以参考：lr_scheduler
我们看一下 step()函数。

class _LRScheduler(object):

    def __init__(self, optimizer, last_epoch=-1):

        # Attach optimizer
        if not isinstance(optimizer, Optimizer):
            raise TypeError('{} is not an Optimizer'.format(
                type(optimizer).__name__))
        self.optimizer = optimizer

        # Initialize epoch and base learning rates
        if last_epoch == -1:
            for group in optimizer.param_groups:
                group.setdefault('initial_lr', group['lr'])
        else:
            for i, group in enumerate(optimizer.param_groups):
                if 'initial_lr' not in group:
                    raise KeyError("param 'initial_lr' is not specified "
                                   "in param_groups[{}] when resuming an optimizer".format(i))
        self.base_lrs = list(map(lambda group: group['initial_lr'], optimizer.param_groups))
        self.last_epoch = last_epoch

        # Following https://github.com/pytorch/pytorch/issues/20124
        # We would like to ensure that `lr_scheduler.step()` is called after
        # `optimizer.step()`
        def with_counter(method):
            if getattr(method, '_with_counter', False):
                # `optimizer.step()` has already been replaced, return.
                return method

            # Keep a weak reference to the optimizer instance to prevent
            # cyclic references.
            instance_ref = weakref.ref(method.__self__)
            # Get the unbound method for the same purpose.
            func = method.__func__
            cls = instance_ref().__class__
            del method

            @wraps(func)
            def wrapper(*args, **kwargs):
                instance = instance_ref()
                instance._step_count += 1
                wrapped = func.__get__(instance, cls)
                return wrapped(*args, **kwargs)

            # Note that the returned function here is no longer a bound method,
            # so attributes like `__func__` and `__self__` no longer exist.
            wrapper._with_counter = True
            return wrapper

        self.optimizer.step = with_counter(self.optimizer.step)
        self.optimizer._step_count = 0
        self._step_count = 0

        self.step()

    def state_dict(self):
        """Returns the state of the scheduler as a :class:`dict`.

        It contains an entry for every variable in self.__dict__ which
        is not the optimizer.
        """
        return {key: value for key, value in self.__dict__.items() if key != 'optimizer'}

    def load_state_dict(self, state_dict):
        """Loads the schedulers state.

        Arguments:
            state_dict (dict): scheduler state. Should be an object returned
                from a call to :meth:`state_dict`.
        """
        self.__dict__.update(state_dict)

    def get_last_lr(self):
        """ Return last computed learning rate by current scheduler.
        """
        return self._last_lr

    def get_lr(self):
        # Compute learning rate using chainable form of the scheduler
        raise NotImplementedError

    def step(self, epoch=None):
        # Raise a warning if old pattern is detected
        # https://github.com/pytorch/pytorch/issues/20124
        if self._step_count == 1:
            if not hasattr(self.optimizer.step, "_with_counter"):
                warnings.warn("Seems like `optimizer.step()` has been overridden after learning rate scheduler "
                              "initialization. Please, make sure to call `optimizer.step()` before "
                              "`lr_scheduler.step()`. See more details at "
                              "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)

            # Just check if there were two first lr_scheduler.step() calls before optimizer.step()
            elif self.optimizer._step_count < 1:
                warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
                              "In PyTorch 1.1.0 and later, you should call them in the opposite order: "
                              "`optimizer.step()` before `lr_scheduler.step()`.  Failure to do this "
                              "will result in PyTorch skipping the first value of the learning rate schedule. "
                              "See more details at "
                              "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
        self._step_count += 1

        class _enable_get_lr_call:

            def __init__(self, o):
                self.o = o

            def __enter__(self):
                self.o._get_lr_called_within_step = True
                return self

            def __exit__(self, type, value, traceback):
                self.o._get_lr_called_within_step = False

        with _enable_get_lr_call(self):
            if epoch is None:
                self.last_epoch += 1
                values = self.get_lr()
            else:
                warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
                self.last_epoch = epoch
                if hasattr(self, "_get_closed_form_lr"):
                    values = self._get_closed_form_lr()
                else:
                    values = self.get_lr()

        for param_group, lr in zip(self.optimizer.param_groups, values):
            param_group['lr'] = lr

        self._last_lr = [group['lr'] for group in self.optimizer.param_groups]

参考：
lr 调整方法总结
 pytorch tutorial
lr 调整方法

JL_Jessie

关注

3
点赞
踩
19

收藏

觉得还不错? 一键收藏
0
评论
pytorch LearningRate 的调整方法总结

优化器中最重要的一个参数是学习率，合理的学习率可以使优化器快速收敛。一般在训练初期设定较大的学习率，随着训练的进行，学习率逐渐减小，学习率什么时候减小，减小多少，这就涉及到学习率调整方法。pytorch V1.60 提供了 10种 learning rate 调整方法，这里做一个简单的总结。所有的学习率调整方法可以分3大类，分别是有序调整，自适应调整，自定义调整。第一类：有序调整，依据一定的规律有序进行调整，这一类是最常用的，分别是等间隔下降(step), 按需设定下降间隔(MultiStep
复制链接

扫一扫

专栏目录