觉得这位博主解释得不错:
link
但是目前存在的问题是:
x = torch.randn(n,d_in)
y = torch.randn(n,d_out)
w1 = torch.randn(d_in,H,requires_grad=True)
w2 = torch.randn(H,d_out,requires_grad=True)
lr = 1e-6
epoch = 500
for it in range(2):
#forward pass
y_pred = x.mm(w1).clamp(min=0).mm(w2) #output shape : n*d_out
# compute loss
loss = (y_pred-y).pow(2).sum()
# print(it,loss)
# backward pass
#1.compute grad
loss.backward()
print('这是第{}次迭代,loss为{},w1.requires_grad:{},w2.requires_grad{}'.format(it+1,loss.item(),w1.requires_grad,w2.requires_grad))
#2.upgrade weights of w1 and w2
with torch.no_grad():
w1 -= lr*w1.grad
w2 = w2-lr*w2.grad
print('w1.requires_grad:',w1.requires_grad)
print('w2.requires_grad:',w2.requires_grad)
可以看到,当自减时,即用‘-=’时,w1经过更新,仍然有grad;但是如果是用常规等式减,w2经过更新,就不再有grad。
所以:
question1:为什么有上述区别?
question2:明明都已经with torch.no_grad了,为什么w1还是有grad?