跑pytorch模型报错:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [720, 64, 36, 36]], which is output 0 of TanhBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
根据[720, 64, 36, 36],定位模型报错部分:
上网搜了一下,由于使用了分布式训练,在模型中使用Y +=X的操作时容易在进行此操作前,数据被修改。因此只要将Y=Y+X改为Y=Y.clone()+X即可