Debugging feature for “modified by an inplace operation“ errors

Feature

Better error logging for inplace operations that throw errors in automatic differentiation.

Motivation

In complex computational graphs, locating the operation causing the error is nontrivial. A feature for isolating the offending operation would save a lot of developer time.

Pitch

See here for a more concrete suggestion. To quote:

I wonder whether it might be worth adding a "debug" modus that records the stack of the op in the forward pass and spits it out on error in the backward. That way, it would point to the right line of code directly.

Alternatives

Just being able to log all inplace operations would be useful.

Question: can't you use the anomaly_detection for that?

Here is an example where it shows up the exact line where the error occurs:

import torch

with torch.autograd.set_detect_anomaly(True):
    a = torch.rand(1, requires_grad=True)
    c = torch.rand(1, requires_grad=True)

    b = a ** 2 * c ** 2
    b += 1
    b *= c + a

    d = b.exp_()
    d *= 5

    b.backward()
And the stack trace:

sys:1: RuntimeWarning: Traceback of forward call that caused the error:
  File "tst.py", line 13, in <module>
    d = b.exp_()

Traceback (most recent call last):
  File "tst.py", line 16, in <module>
    b.backward()
  File "/Users/fmassa/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/Users/fmassa/anaconda3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
which points directly to b.exp_(), and indeed, if you change that line to be b.exp(), it all works fine.

Closing as this can be obtained with anomaly_detection.

Great, thanks. This might be a good thing to add to the section on in-place operations in the autograd mechanics tutorial. I'm happy to add a sentence or two, but the contributing doc doesn't mention how to modify tutorials or notes. 

which points directly to b.exp_(), and indeed, if you change that line to be b.exp(), it all works fine.

To clarify for other readers, the anomaly detection will not necessarily point you at the inplace operation that caused the failure. Instead, it will point you at the operation that could not compute its gradient in the backward pass. The inplace operation to blame may occur anywhere after that, modifying one of the tensors that participated in the line found by the anomaly detection.

Example:

x = torch.rand(10, 20, requires_grad=True)
y = torch.rand(10)
z = (x / y[:, np.newaxis])  # anomaly detection will point here
c = y.abs_()  # but the problem is here
z.sum().backward()


The last line will cause a RuntimeError. With anomaly detection enabled, it will point at the line performing the division, but the inplace operation came later.

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值