近日在对一个包含InplaceABN模块的网络进行魔改的时候,遇到了如下报错:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 256, 7, 7]], which is output 0 of InPlaceABNBackward, is at version 3; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
![]()
之前应用InplaceABN的时候,并没有研读过paper和代码,所以在解决这个问题的时候,花费了数小时,像无头苍蝇一样试错,虽然知道是连续的inplace操作引发的问题,但是没有定位到具体引发问题是在哪个block的哪块代码,居然一直在错误地方尝试clone()来解决。次日常看github的issue,才将问题原因真正搞清楚。
1. InplaceABN提供的block
ABN is standard BN + activation (no memory savings).InPlaceABN is BN+activation done inplace (with memory savings).InPlaceABNSyncis BN+activ

本文讲述了作者在修改包含InplaceABN模块的网络时遇到的运行时错误,详细解析了InplaceABN的工作原理,重点在于理解为何连续的inplace操作导致梯度计算失败。通过实际案例和GitHub issue线索,揭示了问题定位和解决方案,为类似问题的排查提供指导。
最低0.47元/天 解锁文章
2199

被折叠的 条评论
为什么被折叠?



