pytorch 的 inplace 的问题

最新推荐文章于 2024-04-10 16:51:05 发布

dongyuqing1987

最新推荐文章于 2024-04-10 16:51:05 发布

阅读量2.2k

点赞数

文章标签： pytorch 深度学习神经网络

本文链接：https://blog.csdn.net/dongyuqing1987/article/details/121903930

版权

pytorch 的 inplace 的问题

背景：
relu等激活函数的inplace：
“+=”操作的默认inplace：
报错形式
最后说两句

背景：

最近将一个模型训练代码从caffe平台移植到pytorch平台过程中遇到了一个诡异的inplace坑，特别记录一下防止大家掉坑。

relu等激活函数的inplace：

看到官方的relu入参是中nn.ReLU(inplace=True)是inplace操作，我一想这不是能节省我的资源占用，毫不犹豫的选择了True。搭好模型运行，结果是在backward计算导数的时候。表示inplace操作导致求导结果有问题。改为**nn.ReLU(inplace=False)**问题在pytorch1.3和pytorch1.8解决了。

“+=”操作的默认inplace：

后来因为换了pytorch1.10版本后又报错了。报错形式和上面一样。后来查了一下“+=”也是inplace操作，将工程中网络的前向传播方式中：x += feature16改为：“x = x + feature16” 。问题就消失了。

报错形式

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [×，×，×，×]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).