代码中包含三个需要更新的网络,A、B互相利用彼此的输出计算一部分不需要向对方传递的损失,因此,两部分涉及到的对方的变量都需要:
X.detach()
否则,当更新完成B网络之后,再计算A的损失并更新A时,会报错:
one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 32, 3, 3]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
首先更新A网络,虽然后续不需要用到此部分损失来更新B、C,但依然要:
loss_A.backward(retain_graph=True)
否则会报错:
Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
最后加上了detach,并且最先更新A网络的参数。虽然问题解决了,但不明白为什么当A放在最后更新时,就算先zero_grad,也会报第一个错误。。求答疑QAQ)