RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512]] is at version 4; expected version 3 instead.
错误描述
在执行loss.backward()时出现错误:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512]] is at version 4; expected version 3 instead. Hint: enable anomaly
detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
尝试操作
- 将nn.ReLU()改为nn.ReLU(inplace=False) ,失败;
- 将nn.ReLU()改为nn.ReLU6() ,失败;
- 在 torch.nn.parallel.DistributedDataParallel(…,broadcast_buffers=False,… ) 中添加broadcast_buffers=False参数,成功解决;
分析
我只使用了一个GPU和一个节点,而代码支持多GPU。
具体原因未知。
类似问题参考:
https://github.com/NVlabs/FUNIT/issues/23