在pytorch中由于原地操作引起的报错：RuntimeError: one of the variables needed for gradient computation has been mod

夤夜Shinya

已于 2023-07-13 15:05:59 修改

阅读量1.9k

点赞数 6

文章标签： pytorch 深度学习 python

于 2023-05-08 22:04:27 首次发布

本文链接：https://blog.csdn.net/qq_30594197/article/details/130568443

版权

文章目录

报错的解决

在使用pytorch训练用于攻击yolov5模型的对抗样本时遇到的问题，是由于yolov5模型中对于传入数据在模型中传递时使用了一些原地操作导致的。
报错信息如下：

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [1, 3, 128, 128, 47]], which is output 0 of SigmoidBackward0, is at version 2; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

翻译过来就是说梯度计算所需的变量之一已被就地操作修改，[torch.cuda.HalfTensor [1, 3, 128, 128, 47]]是我被原地修改了的那个tensor的类型和shape，SigmoidBackward0是那个tensor的grad_fn，不过这个version 2和version 0我没看懂也没找到资料，pytorch官网文档应该有写。

然后看看通过with torch.autograd.set_detect_anomaly(True)语句输出的操作回溯：

d:\路径\Anaconda3\envs\pytorch2_0\lib\site-packages\torch\autograd\__init__.py:200: UserWarning: Error detected in SigmoidBackward0. 
Traceback of forward call that caused the error:
  File "d:\Program Files\Anaconda3\envs\pytorch2_0\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
……
……
File "d:\我的某个文件路径\yolov5-5.0\.\models\yolo.py", line 122, in forward
    y1 = x1.sigmoid()
 (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\autograd\python_anomaly_mode.cpp:119.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

这里的回溯很明显地指出了出问题的变量是y1，那么看看y1都被干了些什么：（其实这里是yolov5 v5.0版本中的yolo.py的Detect类中的内容，不过我为了训练对抗样本所以把它改了一点）

y1 = x1.sigmoid()
y1[..., 0:2] = (y1[..., 0:2] * 2. - 0.5 + self.grid[1]) * self.stride[1]  # xy
y1[..., 2:4] = (y1[..., 2:4] * 2) ** 2 * self.anchor_grid[1]  # wh

这里对于y1，直接按坐标修改其内容是属于原地操作的一种的（注意，这里是一个原地操作并不是因为等号右边出现了y1[..., 0:2]，而是在等号左边写了y1[..., 0:2]，对y1的一部分赋值，这个操作就是一个原地操作）。要改正也很简单，不要直接像上述代码中那样修改y1里的内容，而是先把要改的内容拿出来单独做乘法，最后用torch.cat把各个部分拼接起来，修改后的代码块如下：

y1 = x1.sigmoid()
y1_02 = (y1[..., 0:2] * 2. - 0.5 + self.grid[1]) * self.stride[1]  # xy
y1_24 = (y1[..., 2:4] * 2) ** 2 * self.anchor_grid[1]  # wh
y1_4 = y1[..., 4:]
# y1[..., 0:2] = (y1[..., 0:2] * 2. - 0.5 + self.grid[1]) * self.stride[1]  # xy
# y1[..., 2:4] = (y1[..., 2:4] * 2) ** 2 * self.anchor_grid[1]  # wh
y1 = torch.cat((y1_02, y1_24, y1_4), dim=-1)

这样就行了。

pytorch中应该避免的原地操作

从博客开头的那个报错信息中可以看到，在模型中，需要求梯度的tensor不应该被原地修改，但是这里的原地修改有很多博客都没有讲清楚。有些博客中建议“将代码中的"a+=b"之类的操作改为"c = a + b"”，但是这样就会导致出现一个可能会陷入的误区：即x+=1是原地操作，那么x=x+1是不是原地操作？

答案是否定的，根据官网论坛中pytorch作者Adam Paszke的回答，（机翻）

x = x + 1不是原地的，因为它接受x指向的对象，创建一个新的变量，给x加1，把结果放在新的变量中，并覆盖x引用的对象以指向新的变量。没有就地修改，你只能改变Python引用(你可以检查id(x)在该行前后是否不同)。

此外在紧接着的下一段，他说道：

另一方面，执行x += 1或x[0] = 1将就地修改变量的数据，因此不会进行复制。然而，一些函数(在你的情况下*)要求输入永远不会改变后，他们计算输出，否则他们将无法计算梯度。这就是为什么会引发错误。

可见在python中，对tensor按坐标赋值是一个原地操作，这也是本文上一节中出问题的原因。

参考链接

1.https://blog.csdn.net/m0_38129460/article/details/90405086
2.https://discuss.pytorch.org/t/encounter-the-runtimeerror-one-of-the-variables-needed-for-gradient-computation-has-been-modified-by-an-inplace-operation/836/4

夤夜Shinya

关注

6
点赞
踩
4

收藏

觉得还不错? 一键收藏
1
评论
在pytorch中由于原地操作引起的报错：RuntimeError: one of the variables needed for gradient computation has been mod

在python中，对tensor按坐标赋值是一个原地操作
复制链接

扫一扫