Pytorch Bug解决：RuntimeError:one of the variables needed for gradient computation has been modified

最新推荐文章于 2024-04-10 11:50:13 发布

一颗磐石

最新推荐文章于 2024-04-10 11:50:13 发布

阅读量5.3k

点赞数 7

分类专栏： Debug 文章标签： Pytorch 梯度计算反向传播 Bug解决 RuntimeError

本文链接：https://blog.csdn.net/just_do_myself/article/details/124381619

版权

PyTorch RuntimeError inplace操作梯度计算反向传播

关键词由CSDN通过智能技术生成

Debug 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Pytorch Bug解决：RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

编程环境
Bug描述
bug分析
解决方法

编程环境

Python：3.9
Pytorch：1.11.0

Bug描述

Traceback (most recent call last):
  File "E:\Anaconda\envs\torch_c_13\lib\site-packages\IPython\core\interactiveshell.py", line 3343, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-18bbaf295f9c>", line 1, in <module>
    runfile('E:/Code/AEs by PyTorch/AEsingle_train_test_temp.py', wdir='E:/Code/AEs by PyTorch')
  File "E:\SoftWare\PyCharm\PyCharm 2021.2.3\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "E:\SoftWare\PyCharm\PyCharm 2021.2.3\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "E:/Code/AEs by PyTorch/AEsingle_train_test_temp.py", line 205, in <module>
    train_ae_x_h2 = AEtrain(AEmodel2, train_ae_x_h1, 10, "AEmodel2")
  File "E:/Code/AEs by PyTorch/AEsingle_train_test_temp.py", line 95, in AEtrain
    loss.backward()
  File "E:\Anaconda\envs\torch_c_13\lib\site-packages\torch\tensor.py", line 195, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "E:\Anaconda\envs\torch_c_13\lib\site-packages\torch\autograd\__init__.py", line 99, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [784, 512]], which is output 0 of TBackward, is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

梯度计算所需的变量之一已被 inplace 操作修改。

bug分析

查看报错信息发现是loss.backward()部分报错，即在根据损失计算某个变量反向传播的梯度时，该变量被inplace操作修改了，即被原地修改，而不是将变化后的值赋给一个新的变量（类似于 a += 1 的操作）。因为反向传播遵循的是链式求导法则，这种原地修改的操作有时会使得模型计算梯度时找不到依赖的变量，从而导致梯度信息丢失或恶意篡改。

解决方法

找到网络模型中的 inplace 操作，将inplace=True改成 inplace=False，例如torch.nn.ReLU(inplace=False)
将代码中的“a+=b”之类的操作改为“c = a + b”
将loss.backward()函数内的参数retain_graph值设置为True, loss.backward(retain_graph=True)，如果retain_graph设置为False，计算过程中的中间变量使用完即被释放掉。

注意：所以大家在平常写代码的时候一定要养成良好的习惯，不乱用变量，注意代码的简洁性和高效性。