eliminate the gradient
In deep-learning codes, before we train the model using the data, we should use this clause
model.optim.zero_grad()
to zero the gradients.
That’s because, pytorch use the method of accumulating the gradients on subsequent passes to do backprapagation. So if you don’t zero the gradients, the optimization direction will pointed in the other direction than the intended direction towards the minium