PyTorch知识点补全

最新推荐文章于 2022-12-10 09:54:02 发布

乐清sss

最新推荐文章于 2022-12-10 09:54:02 发布

阅读量672

点赞数 1

分类专栏： Pytorch 文章标签： python 深度学习 pytorch

本文链接：https://blog.csdn.net/sunyueqinghit/article/details/115758770

版权

PyTorch 学习率衰减 warmup策略参数组学习率调度

关键词由CSDN通过智能技术生成

Pytorch 专栏收录该内容

9 篇文章 1 订阅

订阅专栏

1. 如何在 PyTorch 中设定学习率衰减（learning rate decay）

很多时候我们要对学习率（learning rate）进行衰减，下面的代码示范了如何每30个epoch按10%的速率衰减：

def adjust_learning_rate(optimizer, epoch):
    """Sets the learning rate to the initial LR decayed by 10 every 30 epochs"""
    lr = args.lr * (0.1 ** (epoch // 30))
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

什么是param_groups?
optimizer通过param_group来管理参数组，param_group中保存了参数组及其对应的学习率,动量等等，所以我们可以通过更改param_group[‘lr’]的值来更改对应参数组的学习率。

# 有两个`param_group`即,len(optim.param_groups)==2
optim.SGD([
                {'params': model.base.parameters()},
                {'params': model.classifier.parameters(), 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)

#一个参数组
optim.SGD(model.parameters(), lr=1e-2, momentum=.9)

加个我代码里的例子：

    no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']

    grouped_parameters = [
        {'params': [p for n, p in model.encoder.named_parameters() if not any(nd in n for nd in no_decay)], 'weight_decay': args.weight_decay, 'lr': args.encoder_lr},
        {'params': [p for n, p in model.encoder.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.0, 'lr': args.encoder_lr},
        {'params': [p for n, p in model.decoder.named_parameters() if not any(nd in n for nd in no_decay)], 'weight_decay': args.weight_decay, 'lr': args.decoder_lr},
        {'params': [p for n, p in model.decoder.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.0, 'lr': args.decoder_lr},
    ]
    optimizer = OPTIMIZER_CLASSES[args.optim](grouped_parameters)

2. warmup 策略

知乎问答：神经网络中 warmup 策略为什么有效；有什么理论解释么？

warmup_lr 的初始值是跟训练语料的大小成反比的，也就是说训练语料越大，那么warmup_lr 初值越小，随后增长到我们预设的超参 initial_learning_rate相同的量级，再接下来又通过 decay_rates 逐步下降。
这样做有什么好处？
1）这样可以使得学习率可以适应不同的训练集合size实验的时候经常需要先使用小的数据集训练验证模型，然后换大的数据集做生成环境模型训练。
2）即使不幸学习率设置得很大，那么也能通过warmup机制看到合适的学习率区间（即训练误差先降后升的关键位置附近），以便后续验证。

# import的内容
try:
    from transformers import (ConstantLRSchedule, WarmupLinearSchedule, WarmupConstantSchedule)
except:
    from transformers import get_constant_schedule, get_constant_schedule_with_warmup,  get_linear_schedule_with_warmup
......
......
# main里写的
    if args.lr_schedule == 'fixed':
        try:
            scheduler = ConstantLRSchedule(optimizer)
        except:
            scheduler = get_constant_schedule(optimizer)
    elif args.lr_schedule == 'warmup_constant':
        try:
            scheduler = WarmupConstantSchedule(optimizer, warmup_steps=args.warmup_steps)
        except:
            scheduler = get_constant_schedule_with_warmup(optimizer, num_warmup_steps=args.warmup_steps)
    elif args.lr_schedule == 'warmup_linear':
        max_steps = int(args.n_epochs * (dataset.train_size() / args.batch_size))
        try:
            scheduler = WarmupLinearSchedule(optimizer, warmup_steps=args.warmup_steps, t_total=max_steps)
        except:
            scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=args.warmup_steps, num_training_steps=max_steps)

持续更新中…

参考：

https://www.pytorchtutorial.com/pytorch-learning-rate-decay/
https://www.zhihu.com/question/338066667/answer/973639422

乐清sss

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
PyTorch知识点补全

1. 如何在 PyTorch 中设定学习率衰减（learning rate decay）很多时候我们要对学习率（learning rate）进行衰减，下面的代码示范了如何每30个epoch按10%的速率衰减：def adjust_learning_rate(optimizer, epoch): """Sets the learning rate to the initial LR decayed by 10 every 30 epochs""" lr = args.lr * (0.1 *
复制链接

扫一扫

专栏目录