8.pytorch lightning之训练时的技术

最新推荐文章于 2025-03-24 22:46:30 发布

CsdnWujinming

最新推荐文章于 2025-03-24 22:46:30 发布

阅读量2k

点赞数

文章标签： python 深度学习机器学习

本文链接：https://blog.csdn.net/CsdnWujinming/article/details/130413777

版权

训练可用的技术

梯度累积 Accumulate Gradients

accumulate_grad_batches=K，设置后会每K个batches累积梯度，然后反向传播一次，相当于增加了batch size但内存没有增加。

# DEFAULT (ie: no accumulated grads)
trainer = Trainer(accumulate_grad_batches=1)

# Accumulate gradients for 7 batches
trainer = Trainer(accumulate_grad_batches=7)

梯度裁剪 Gradient Clipping

将梯度值限制在指定范围内，防止梯度爆炸。

# DEFAULT (ie: don't clip) 不u
trainer = Trainer(gradient_clip_val=0)

# clip gradients' global norm to <=0.5 using gradient_clip_algorithm='norm' by default
# 将norm后>0.5的梯度裁剪
trainer = Trainer(gradient_clip_val=0.5)

# 将梯度值>0.5的裁剪
# clip gradients' maximum magnitude to <=0.5
trainer = Trainer(gradient_clip_val=0.5, gradient_clip_algorithm="value")

学习率搜索

PL中内置学习率搜索工具，其使用论文Cyclical Learning Rates for Training Neural Networks的方法将学习率逐渐增大，然后运行少量batches，最后绘制loss vs lr曲线，并给出建议的学习率。

为了搜索学习率，LightningModule模型中必须声明一个learning_rate属性，创建Tuner对象，调用lr_fine方法。

from lightning.pytorch.tuner import Tuner


class LitModel(LightningModule):
    def __init__(self, learning_rate):
        super().__init__()
        # 声明学习率字段
        self.learning_rate = learning_rate
        self.model = Model(...)

    # 配置优化器，但是搜索学习率时只能指定一个优化器。
    def configure_optimizers(self):
        return Adam(self.parameters(), lr=(self.lr or self.learning_rate))


model = LitModel()
trainer = Trainer(...)

# Create a Tuner
tuner = Tuner(trainer)

# finds learning rate automatically
# sets hparams.lr or hparams.learning_rate to that learning rate
tuner.lr_find(model)

# to set to your own hparams.my_value 如果模型中学习率的字段不为learning_rate，则需要声明字段名。
# tuner.lr_find(model, attr_name="my_value")


# 打印搜索结果
print(lr_finder.results)

# 画出曲线图
fig = lr_finder.plot(suggest=True)
fig.show()

# 获取建议值Pick point based on plot, or get suggestion
new_lr = lr_finder.suggestion()

# update hparams of the model
model.hparams.lr = new_lr

# Fit model
trainer.fit(model)