文章目录
张量梯度和雅可比积
在很多情况下,我们有一个标量损失函数,我们需要计算一些参数的梯度。但是,有些情况下输出函数是任意张量。在这种情况下,PyTorch 允许您计算所谓的Jacobian 乘积,而不是实际的梯度。
对于向量函数
y
⃗
=
f
(
x
⃗
)
\vec{y}=f\left( \vec{x} \right)
y=f(x),其中
x
⃗
=
<
x
1
,
.
.
.
,
x
n
>
\vec{x}=\left< x_1,...,x_n \right>
x=⟨x1,...,xn⟩ ,
y
⃗
=
<
y
1
,
.
.
.
,
y
m
>
\vec{y}=\left< y_1,...,y_m \right>
y=⟨y1,...,ym⟩,利用 Jacobian matrix给出
y
⃗
\vec{y}
y关于
x
⃗
\vec{x}
x的偏导:
J
=
(
∂
y
1
∂
x
1
⋯
∂
y
1
∂
x
n
⋮
⋱
⋮
∂
y
m
∂
x
1
⋯
∂
y
m
∂
x
n
)
\ J=\left( \begin{matrix} \frac{\partial y_1}{\partial x_1}& \cdots& \frac{\partial y_1}{\partial x_n}\\ \vdots& \ddots& \vdots\\ \frac{\partial y_m}{\partial x_1}& \cdots& \frac{\partial y_m}{\partial x_n}\\ \end{matrix} \right)
J=⎝⎜⎛∂x1∂y1⋮∂x1∂ym⋯⋱⋯∂xn∂y1⋮∂xn∂ym⎠⎟⎞
PyTorch 允许您对于给定的输入向量 v ⃗ = < v 1 , . . . , v m > \vec{v}=\left< v_1,...,v_m \right> v=⟨v1,...,vm⟩计算雅可比乘积 v ⋅ J v \cdot J v⋅J,而不是计算雅可比矩阵本身。这是通过使用v作为参数向后调用来实现的。(v的大小应该和原始张量的大小相同,我们要用它来计算乘积):
v ⋅ J = ( v 1 ⋯ v m ) 1 × m ( ∂ y 1 ∂ x 1 ⋯ ∂ y 1 ∂ x n ⋮ ⋱ ⋮ ∂ y m ∂ x 1 ⋯ ∂ y m ∂ x n ) m × n = ( ∂ y ∂ x 1 ⋯ ∂ y ∂ x n ) 1 × n v \cdot \ J= \left( \begin{matrix} v_1& \cdots& v_m\\ \end{matrix} \right)_{1\times m} \left( \begin{matrix} \frac{\partial y_1}{\partial x_1}& \cdots& \frac{\partial y_1}{\partial x_n}\\ \vdots& \ddots& \vdots\\ \frac{\partial y_m}{\partial x_1}& \cdots& \frac{\partial y_m}{\partial x_n}\\ \end{matrix} \right) _{m\times n} = \left( \begin{matrix} \frac{\partial y}{\partial x_1}& \cdots& \frac{\partial y}{\partial x_n}\\ \end{matrix} \right)_{1\times n} v⋅ J=(v1⋯vm)1×m⎝⎜⎛∂x1∂y1⋮∂x1∂ym⋯⋱⋯∂xn∂y1⋮∂xn∂ym⎠⎟⎞m×n=(∂x1∂y⋯∂xn∂y)1×n
测试
inp = torch.eye(5, requires_grad=True)
out = (inp+1).pow(2)
out.backward(torch.ones_like(inp), retain_graph=True)
print("First call\n", inp.grad)
out.backward(torch.ones_like(inp), retain_graph=True)
print("\nSecond call\n", inp.grad)
inp.grad.zero_()
out.backward(torch.ones_like(inp), retain_graph=True)
print("\nCall after zeroing gradients\n", inp.grad)
First call
tensor([[4., 2., 2., 2., 2.],
[2., 4., 2., 2., 2.],
[2., 2., 4., 2., 2.],
[2., 2., 2., 4., 2.],
[2., 2., 2., 2., 4.]])
Second call
tensor([[8., 4., 4., 4., 4.],
[4., 8., 4., 4., 4.],
[4., 4., 8., 4., 4.],
[4., 4., 4., 8., 4.],
[4., 4., 4., 4., 8.]])
Call after zeroing gradients
tensor([[4., 2., 2., 2., 2.],
[2., 4., 2., 2., 2.],
[2., 2., 4., 2., 2.],
[2., 2., 2., 4., 2.],
[2., 2., 2., 2., 4.]])
关于非标量损失函数backward()函数使用例子
计算 y = 5 x 2 y\ =\ 5x^2 y = 5x2 每个整数(1,2,…,10)的导数
- 利用for循坏得到x为整数时y的值,并依次作为标量进行导数计算(标量)
- 分别利用v(1,0,…,0)继续进行导数运算(非标量)
x = torch.range(1, 10, 1 ,requires_grad=True)
# x1 = torch.randn(91 ,requires_grad=True)
l = []
for i in x:
y = 5*i**2
l.append(y.detach().numpy())
#compute gradients
y.backward()
#print out the gradients.
print(x.grad)
tensor([10., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
tensor([10., 20., 0., 0., 0., 0., 0., 0., 0., 0.])
tensor([10., 20., 30., 0., 0., 0., 0., 0., 0., 0.])
tensor([10., 20., 30., 40., 0., 0., 0., 0., 0., 0.])
tensor([10., 20., 30., 40., 50., 0., 0., 0., 0., 0.])
tensor([10., 20., 30., 40., 50., 60., 0., 0., 0., 0.])
tensor([10., 20., 30., 40., 50., 60., 70., 0., 0., 0.])
tensor([10., 20., 30., 40., 50., 60., 70., 80., 0., 0.])
tensor([10., 20., 30., 40., 50., 60., 70., 80., 90., 0.])
tensor([ 10., 20., 30., 40., 50., 60., 70., 80., 90., 100.])
C:\Users\pandas\AppData\Local\Temp/ipykernel_1728/2678808987.py:1: UserWarning: torch.range is deprecated and will be removed in a future release because its behavior is inconsistent with Python's range builtin. Instead, use torch.arange, which produces values in [start, end).
x = torch.range(1, 10, 1 ,requires_grad=True)
# x.zero_grad()
x = torch.range(1, 10, 1 ,requires_grad=True)
y = 5*x.T*x
v = [1]*10
v = torch.tensor(v, dtype=torch.float)
# y.backward(torch.ones_like(y))
y.backward(v)
x.grad
C:\Users\pandas\AppData\Local\Temp/ipykernel_1728/1828972091.py:2: UserWarning: torch.range is deprecated and will be removed in a future release because its behavior is inconsistent with Python's range builtin. Instead, use torch.arange, which produces values in [start, end).
x = torch.range(1, 10, 1 ,requires_grad=True)
tensor([ 10., 20., 30., 40., 50., 60., 70., 80., 90., 100.])
稍复杂的多维度梯度计算
import torch
from torch.autograd import Variable
x=torch.Tensor([[1.,2.,3.],[4.,5.,6.]])
x=Variable(x,requires_grad=True)
y=x+2
z=y*y*3
out=z.mean()
print(x)
print(y)
print(z)
print(out)
tensor([[1., 2., 3.],
[4., 5., 6.]], requires_grad=True)
tensor([[3., 4., 5.],
[6., 7., 8.]], grad_fn=<AddBackward0>)
tensor([[ 27., 48., 75.],
[108., 147., 192.]], grad_fn=<MulBackward0>)
tensor(99.5000, grad_fn=<MeanBackward0>)
print(x.grad_fn)
print(y.grad_fn)
print(z.grad_fn)
print(out.grad_fn)
None
<AddBackward0 object at 0x0000013BFE723BE0>
<MulBackward0 object at 0x0000013BFE712FD0>
<MeanBackward0 object at 0x0000013BFE723BE0>
out.backward()
x.grad
tensor([[3., 4., 5.],
[6., 7., 8.]])
解析:
由梯度计算的链式法则计算out与x。
o
u
t
=
1
6
∑
i
z
i
out = \frac{1}{6}\sum_i{z_i}
out=61i∑zi
z
i
=
3
×
(
x
i
+
2
)
2
z_i = 3 \times (x_i+2)^2
zi=3×(xi+2)2
∂
o
u
t
∂
x
i
=
x
i
+
2
\frac{\partial out}{\partial x_i} = x_i+2
∂xi∂out=xi+2
所以
(
1
2
3
4
5
6
)
⇒
(
3
4
5
6
7
8
)
\left( \begin{matrix} 1& 2& 3\\ 4& 5& 6\\ \end{matrix} \right) \Rightarrow \left( \begin{matrix} 3& 4& 5\\ 6& 7& 8\\ \end{matrix} \right)
(142536)⇒(364758)