PyTroch 3.2 autograd笔记

最新推荐文章于 2024-06-15 06:22:58 发布

BonAppetit

最新推荐文章于 2024-06-15 06:22:58 发布

阅读量90

点赞数

文章标签： pytorch

本文链接：https://blog.csdn.net/weixin_44185836/article/details/124431173

版权

Autograd

requires_grad & 计算图

如果需要计算某个Tensor的导数，则需要将其requires_grad设置为True
variable默认是不需要求导的，即requires_grad属性默认为False，如果某一个节点requires_grad被设置为True，那么所有依赖它的节点requires_grad都为True
variable的volatile属性默认为False，如果某一个variable的volatile属性被设为True，那么所有依赖它的节点volatile属性都为True。volatile属性为True的节点不会求导，volatile的优先级比requires_grad高。

x = t.ones(1)
b = t.rand(1, requires_grad = True)
w = t.rand(1, requires_grad = True)
y = w * x # 等价于y=w.mul(x)
z = y + b # 等价于z=y.add(b)
z.grad_fn # 查看变量的反向传播函数 <AddBackward0 at 0x7fb73c7cd490>
z.grad_fn.next_functions # next_functions保存grad_fn的输入，是一个tuple，tuple的元素也是Function
z.grad_fn.next_functions[0][0] == y.grad_fn # True

多次反向传播时，梯度是累加的。反向传播的中间缓存会被清空

z.backward(retain_graph=True) # 使用retain_graph来保存buffer

PyTorch使用的是动态图，它的计算图在每次前向传播时都是从头开始构建，所以它能够使用Python控制语句（如for、if等）根据需求创建计算图。

def f(x):
    result = 1
    for ii in x:
        if ii.item()>0: 
            result=ii*result
    return result
x = t.arange(-2,4,dtype=t.float32).requires_grad_()
y = f(x) # y = x[3]*x[4]*x[5]
y.backward()
x.grad  # tensor([0., 0., 0., 6., 3., 2.])

with torch.no_grad()强制之后的内容不进行计算图构建，节省显存

with t.no_grad():
    x = t.ones(1)
    w = t.rand(1, requires_grad = True)
    y = x * w
# y依赖于w和x，虽然w.requires_grad = True，但是y的requires_grad依旧为False
x.requires_grad, w.requires_grad, y.requires_grad # (False, True, False)
t.set_grad_enabled(False)

修改tensor的数值，又不被autograd记录，可以对tensor.data进行操作

a = t.ones(3,4,requires_grad=True) 
a.data.requires_grad # False,但已经是独立于计算图之外 
d = a.data.sigmoid_() # sigmoid_ 是个inplace操作，会修改a自身的值
d.requires_grad # False，autograd不变

在反向传播过程中非叶子节点的导数计算完之后即被清空。若想查看这些变量的梯度，有两种方法：

使用autograd.grad函数
使用hook

推荐使用hook方法，但是在实际使用中应尽量避免修改grad的值。

# 考虑如下
x = t.ones(3, requires_grad=True)
w = t.rand(3, requires_grad=True)
y = x * w # y依赖于w，而w.requires_grad = True
z = y.sum()

反向传播过程中y的导数计算完后被清空

x.requires_grad, w.requires_grad, y.requires_grad # (True, True, True)
z.backward() # 非叶子节点grad计算完之后自动清空，y.grad是None
(x.grad, w.grad, y.grad) # (tensor([0.2772, 0.2943, 0.0755]), tensor([1., 1., 1.]), None)

autograd.grad方法，使用grad获取中间变量的梯度

t.autograd.grad(z, y) # (tensor([1., 1., 1.]),), z对y的梯度，隐式调用backward()

hook方法，hook是一个函数，输入是梯度，不应该有返回值

def variable_hook(grad):
    print('y的梯度：',grad)
hook_handle = y.register_hook(variable_hook) # y的梯度： tensor([1., 1., 1.])  注册hook
z.backward()
hook_handle.remove() # 除非每次都要用hook，否则用完后需移除hook

BonAppetit

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
PyTroch 3.2 autograd笔记

Autogradrequires_grad & 计算图如果需要计算某个Tensor的导数，则需要将其requires_grad设置为Truevariable默认是不需要求导的，即requires_grad属性默认为False，如果某一个节点requires_grad被设置为True，那么所有依赖它的节点requires_grad都为Truevariable的volatile属性默认为False，如果某一个variable的volatile属性被设为True，那么所有依赖它的节点volati
复制链接

扫一扫