译（五十六）-Pytorch梯度剪裁

最新推荐文章于 2024-03-19 15:41:14 发布

MWHLS

最新推荐文章于 2024-03-19 15:41:14 发布

阅读量2.9k

点赞数

分类专栏： python 文章标签： pytorch python StackOverflow

原文链接：https://stackoverflow.com/questions/54716377/how-to-do-gradient-clipping-in-pytorch?answertab=modifieddesc#tab-top

版权

python 专栏收录该内容

96 篇文章 6 订阅

订阅专栏

文章首发及后续更新：https://mwhls.top/3785.html，无图/无目录/格式错误/更多相关请至首发页查看。
新的更新内容请到mwhls.top查看。
欢迎提出任何疑问及批评，非常感谢！

stackoverflow热门问题目录

如有翻译问题欢迎评论指出，谢谢。

PyTorch如何实现梯度剪裁?

Gulzar asked:
- 怎么用 PyTorch 实现梯度剪裁？
- 我碰到了梯度爆炸的问题。

Answers:

Rahul - vote: 143

更完整的示例见这里。

optimizer.zero_grad()        
loss, hidden = model(data, hidden, targets)
loss.backward()

torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip)
optimizer.step()

Charles Xu - vote: 0
- 我碰到了相同的错误，我想剪裁正则但是依然是nan。
  译者注：答主在评论区提到 doesn’t work 是指 still gives a ‘nan’。
- 我不想改变改动网络或者增添正则化，之后我尝试将优化器改为 Adam，问题解决了。
- 具体来说，是使用 Adam 的预训练模型来初始化训练，并使用 SGD 和 momentum 来微调。

hkchengrex - vote: 3

如果用的是 AMP，剪裁前还需要一些步骤：

optimizer.zero_grad()
loss, hidden = model(data, hidden, targets)
self.scaler.scale(loss).backward()

# Unscales the gradients of optimizer's assigned params in-place
self.scaler.unscale_(optimizer)

# Since the gradients of optimizer's assigned params are unscaled, clips as usual:
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)

# optimizer's gradients are already unscaled, so scaler.step does not unscale them,
# although it still skips optimizer.step() if the gradients contain infs or NaNs.
scaler.step(optimizer)

# Updates the scale for next iteration.
scaler.update()

参考： https://pytorch.org/docs/stable/notes/amp_examples.html#gradient-clipping

How to do gradient clipping in pytorch?

Gulzar asked:
- What is the correct way to perform gradient clipping in pytorch?
  怎么用 PyTorch 实现梯度剪裁？
- I have an exploding gradients problem.
  我碰到了梯度爆炸的问题。
Answers:
- Rahul - vote: 143
  - A more complete example from here:
    更完整的示例见这里。
  - ```
  optimizer.zero_grad()        
  loss, hidden = model(data, hidden, targets)
  loss.backward()
  
  torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip)
  optimizer.step()
```
- Charles Xu - vote: 0
  - Well, I met with same err. I tried to use the clip norm but it doesn’t work.
    我碰到了相同的错误，我想剪裁正则但是依然是nan。
    译者注：答主在评论区提到 doesn’t work 是指 still gives a ‘nan’。
  - I don’t want to change the network or add regularizers. So I change the optimizer to Adam, and it works.
    我不想改变改动网络或者增添正则化，之后我尝试将优化器改为 Adam，问题解决了。
  - Then I use the pretrained model from Adam to initate the training and use SGD + momentum for fine tuning. It is now working.
    具体来说，是使用 Adam 的预训练模型来初始化训练，并使用 SGD 和 momentum 来微调。
- hkchengrex - vote: 3
  - And if you are using Automatic Mixed Precision (AMP), you need to do a bit more before clipping:
    如果用的是 AMP，剪裁前还需要一些步骤：
  - ```
  optimizer.zero_grad()
  loss, hidden = model(data, hidden, targets)
  self.scaler.scale(loss).backward()
  
  # Unscales the gradients of optimizer's assigned params in-place
  self.scaler.unscale_(optimizer)
  
  # Since the gradients of optimizer's assigned params are unscaled, clips as usual:
  torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)
  
  # optimizer's gradients are already unscaled, so scaler.step does not unscale them,
  # although it still skips optimizer.step() if the gradients contain infs or NaNs.
  scaler.step(optimizer)
  
  # Updates the scale for next iteration.
  scaler.update()
```
- Reference: https://pytorch.org/docs/stable/notes/amp_examples.html#gradient-clipping
  参考： [https://pytorch.org/docs/stable/notes/amp_examples.html#gradient-clipping](