刚接触pytorch不久,就遇到这种问题,既然要计算梯度(pytorch用loss.backward()计算各参数的梯度loss.grad),那为什么要先将上个epoch的梯度重置为0再去计算呢?其主要原因是:
在torch.autograd中,上个epoch计算Variable中的grad成员也会自动变成Variable,也就是说,Variable变多了,但是这些Variable是不需要的,所以将其重置为0,让它不起作用。这样的话,下一次求的梯度就是之前Variable对应的梯度了。
官方解释:
We then set the gradients to zero, so that we are ready for the next loop. Otherwise, our gradients would record a running tally of all the operations that had happened (i.e. loss.backward() adds the gradients to whatever is already stored, rather than replacing them).
参考:
https://pytorch.org/tutorials/beginner/nn_tutorial.html