pytorch中backward()函数与gradient 参数详解

矩阵乘法的例子1

以下例来说明backward中参数gradient的作用

注意在本文中@表示矩阵乘法,*表示对应元素相乘

求A求偏导

试运行代码1

import torch

# [1,2]@[2*3]=[1,3]
A = torch.tensor([[1., 2.]], requires_grad=True)
B = torch.tensor([[10., 20., 30.], [100., 200., 300.]], requires_grad=True)
Z = A @ B

Z.backward()  # dz1/dx1, dz2/dx1

运行结果1

Traceback (most recent call last):
  File "D:\SoftProgram\JetBrains\PyCharm Community Edition 2023.1\plugins\python-ce\helpers\pydev\pydevd.py", line 1496, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "D:\SoftProgram\JetBrains\PyCharm Community Edition 2023.1\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "E:\program\python\Bili_deep_thoughts\RNN\help\h_backward_5.py", line 12, in <module>
    Z.backward()  # dz1/dx1, dz2/dx1
  File "D:\SoftProgram\JetBrains\anaconda3_202303\lib\site-packages\torch\_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "D:\SoftProgram\JetBrains\anaconda3_202303\lib\site-packages\torch\autograd\__init__.py", line 166, in backward
    grad_tensors_ = _make_grads(tensors, grad_tensors_, is_grads_batched=False)
  File "D:\SoftProgram\JetBrains\anaconda3_202303\lib\site-packages\torch\autograd\__init__.py", line 67, in _make_grads
    raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs

查到错误原因

如果 Tensor 是一个标量(即它包含一个元素的数据),则不需要为 backward() 指定任何参数,但是如果它有更多的元素,则需要指定一个 gradient 参数,该参数是形状匹配的张量。

看完有一点了解,但是不太明白“该参数是形状匹配的张量”。

尝试修改代码2

import torch

# [1,2]@[2*3]=[1,3]
A = torch.tensor([[1., 2.]], requires_grad=True)
B = torch.tensor([[10., 20., 30.], [100., 200., 300.]], requires_grad=True)
Z = A @ B

Z.backward(torch.ones_like(A))  # dz1/dx1, dz2/dx1

运行结果2

Traceback (most recent call last):
  File "D:\SoftProgram\JetBrains\PyCharm Community Edition 2023.1\plugins\python-ce\helpers\pydev\pydevd.py", line 1496, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "D:\SoftProgram\JetBrains\PyCharm Community Edition 2023.1\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "E:\program\python\Bili_deep_thoughts\RNN\help\h_backward_5.py", line 12, in <module>
    Z.backward(torch.ones_like(A))  # dz1/dx1, dz2/dx1
  File "D:\SoftProgram\JetBrains\anaconda3_202303\lib\site-packages\torch\_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "D:\SoftProgram\JetBrains\anaconda3_202303\lib\site-packages\torch\autograd\__init__.py", line 166, in backward
    grad_tensors_ = _make_grads(tensors, grad_tensors_, is_grads_batched=False)
  File "D:\SoftProgram\JetBrains\anaconda3_202303\lib\site-packages\torch\autograd\__init__.py", line 50, in _make_grads
    raise RuntimeError("Mismatch in shape: grad_output["
RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([1, 2]) and output[0] has a shape of torch.Size([1, 3])

看到这里明白了点,“该参数是形状匹配的张量”更具体的应该是“该参数是和输出Z形状匹配的张量”。

解释

由于pytorch只能对标量进行求导,对于非标量的求导,其思想是将其转换成标量,即将其各个元素相加,然后求导。
但是如果只是简单的把全部元素相加,就无法知道z2对a2的导数,因此引入gradient参数,其形状和输出Z的形状一致,gradient的参数是累加时Z中对应参数的系数,以一开始的矩阵乘法的例子1为例,即

 

 因此矩阵Z对A的导数,就转换成了标量R对A导数,即

注意:对A求导的结果是A的形状

 验证

只求z1对A的导数,即G=[1,0,0],运行代码3

import torch

# [1,2]@[2*3]=[1,3],gradient=[1,3]
A = torch.tensor([[1., 2.]], requires_grad=True)
B = torch.tensor([[10., 20., 30.], [100., 200., 300.]], requires_grad=True)
Z = A @ B

# Z.backward(torch.ones_like(A))  # dz1/dx1, dz2/dx1
Z.backward(gradient=torch.tensor([[1., 0., 0.]])) # dz1/dx1, dz2/dx1
print('Z=', Z)
print('A.grad=', A.grad)
print('B.grad=', B.grad)

运行结果3

Z= tensor([[210., 420., 630.]], grad_fn=<MmBackward0>)

A.grad= tensor([[ 10., 100.]])
B.grad= tensor([[1., 0., 0.],
        [2., 0., 0.]])

 尝试解释一下对B的导数

哈达马积的例子

代码4

import torch

# [1,3]*[2*3]=[1,3],gradient=[2,3]
A1 = torch.tensor([[1., 2., 3.]], requires_grad=True)
B1 = torch.tensor([[10., 20., 30.], [100., 200., 300.]], requires_grad=True)
Z1 = A1 * B1 
G1 = torch.zeros_like(Z1)
G1[1, 0] = 1
Z1.backward(gradient=G1)  # dz1/dx1, dz2/dx1
print('Z1=', Z1)
print('Z1.shape=', Z1.shape)
print('A1.grad=', A1.grad)
print('B1.grad=', B1.grad)
pass

运行结果4

Z1= tensor([[ 10.,  40.,  90.],
        [100., 400., 900.]], grad_fn=<MulBackward0>)
Z1.shape= torch.Size([2, 3])
A1.grad= tensor([[100.,   0.,   0.]])
B1.grad= tensor([[0., 0., 0.],
        [1., 0., 0.]])

 矩阵乘法的例子2

代码5

import torch

A11 = torch.tensor([[0.1, 0.2], [1., 2.]], requires_grad=True)
B11 = torch.tensor([[10., 20., 30.], [100., 200., 300.]], requires_grad=True)
Z11 = torch.mm(A11, B11)
J11 = torch.zeros((2, 2))
G11 = torch.zeros_like(Z11)
G11[1, 0] = 1
Z11.backward(G11, retain_graph=True)  # dz1/dx1, dz2/dx1
print('Z11=', Z11)
print('Z11.shape=', Z11.shape)
print('A11.grad=', A11.grad)
print('B11.grad=', B11.grad)
pass

运行结果5

Z11= tensor([[ 21.,  42.,  63.],
        [210., 420., 630.]], grad_fn=<MmBackward0>)
Z11.shape= torch.Size([2, 3])
A11.grad= tensor([[  0.,   0.],
        [ 10., 100.]])
B11.grad= tensor([[1., 0., 0.],
        [2., 0., 0.]])

参考

Pytorch中的.backward()方法 - 知乎

 Pytorch autograd,backward详解 - 知乎

PyTorch中的backward - 知乎

grad can be implicitly created only for scalar outputs_一只皮皮虾x的博客-CSDN博客

  • 3
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值