一、简介
torch.auotograd
模块可实现任意标量值函数自动求导的类和函数。针对一个张量只需要设置参数requires_grad=True
,通过相关计算即可输出其在传播过程中的梯度(导数)信息。
二、案例分析
分析1
在PyTorch中生成一个矩阵张量x,且 y = s u m ( x 2 + 2 x + 1 ) y = sum(x^2+2x+1) y=sum(x2+2x+1),计算出y在x上的导数,程序如下:
import torch
x = torch.tensor([[1.0, 2], [3, 4]], requires_grad=True)
y = torch.sum(x ** 2 + 2 * x + 1)
y.backward()
print(x.grad)
>>>tensor([[ 4., 6.],
[ 8., 10.]])
首先,生成
[
2
×
2
]
[2\times 2]
[2×2]维度的张量
x
x
x,并计算
y
=
s
u
m
(
x
2
+
2
x
+
1
)
y = sum(x^2+2x+1)
y=sum(x2+2x+1),计算出的
y
y
y是标量(scalar value)。
y
=
(
x
11
2
+
2
x
11
+
1
)
+
(
x
13
2
+
2
x
13
+
1
)
+
(
x
31
2
+
2
x
31
+
1
)
+
(
x
2
2
+
2
x
22
+
1
)
y=(x_{11}^2+2x_{11}+1)+(x_{13}^2+2x_{13}+1)+(x_{31}^2+2x_{31}+1)+(x_{2}^2+2x_{22}+1)
y=(x112+2x11+1)+(x132+2x13+1)+(x312+2x31+1)+(x22+2x22+1)
此时使用y.backward()
即可自动计算出
y
y
y在
x
x
x每个元素上的导数。即:
[
∂
y
∂
x
11
∂
y
∂
x
12
∂
y
∂
x
21
∂
y
∂
x
22
]
=
[
2
x
11
+
2
2
x
12
+
2
2
x
21
+
2
2
x
22
+
2
]
\left[ \begin{matrix} \frac{\partial y}{\partial x_{11}} &\frac{\partial y}{\partial x_{12}}\\ \frac{\partial y}{\partial x_{21}} &\frac{\partial y}{\partial x_{22}}\\ \end{matrix} \right]= \left[ \begin{matrix} 2x_{11} + 2&2x_{12} + 2\\ 2x_{21} + 2&2x_{22} + 2\\ \end{matrix} \right]
[∂x11∂y∂x21∂y∂x12∂y∂x22∂y]=[2x11+22x21+22x12+22x22+2]
通过计算即可得出如下结果:
[[ 4., 6.],
[ 8., 10.]]
分析2
为什么要使用sum()呢?可不可以去掉sum()呢?像这样:
y
=
x
2
+
2
x
+
1
y = x^2+2x+1
y=x2+2x+1
尝试一下:
import torch
x = torch.tensor([[1.0, 2], [3, 4]], requires_grad=True)
y = x ** 2 + 2 * x + 1
y.backward()
print(x.grad)
此时会报错,其含义大概就是.backward()
只能对标量使用。
经过查阅相关内容,对代码进行改进:
import torch
x = torch.tensor([[1.0, 2], [3, 4]], requires_grad=True)
y = x ** 2 + 2 * x + 1
y.backward(gradient=torch.tensor([[1.0, 1], [1, 1]]))
print(x.grad)
>>>tensor([[ 4., 6.],
[ 8., 10.]])
在y.backward()
内添加一个
[
2
×
2
]
[2\times 2]
[2×2]的单位向量,此时代码可成功运行。可以这么理解:
[
y
11
y
12
y
21
y
22
]
=
[
(
x
11
2
+
2
x
11
+
1
)
(
x
12
2
+
2
x
12
+
1
)
(
x
21
2
+
2
x
21
+
1
)
(
x
22
2
+
2
x
22
+
1
)
]
\left[ \begin{matrix} y_{11} &y_{12}\\ y_{21}&y_{22}\\ \end{matrix} \right]= \left[ \begin{matrix} (x_{11}^2+2x_{11}+1)&(x_{12}^2+2x_{12}+1)\\ (x_{21}^2+2x_{21}+1)&(x_{22}^2+2x_{22}+1)\\ \end{matrix} \right]
[y11y21y12y22]=[(x112+2x11+1)(x212+2x21+1)(x122+2x12+1)(x222+2x22+1)]
y已经不再是一个标量,而是一个矩阵,此时的求导是对应元素分别求导。
[
∂
y
11
∂
x
11
∂
y
12
∂
x
12
∂
y
21
∂
x
21
∂
y
22
∂
x
22
]
=
[
2
x
11
+
2
2
x
12
+
2
2
x
21
+
2
2
x
22
+
2
]
\left[ \begin{matrix} \frac{\partial y_{11}}{\partial x_{11}} &\frac{\partial y_{12}}{\partial x_{12}}\\ \frac{\partial y_{21}}{\partial x_{21}} &\frac{\partial y_{22}}{\partial x_{22}}\\ \end{matrix} \right]= \left[ \begin{matrix} 2x_{11} + 2&2x_{12} + 2\\ 2x_{21} + 2&2x_{22} + 2\\ \end{matrix} \right]
[∂x11∂y11∂x21∂y21∂x12∂y12∂x22∂y22]=[2x11+22x21+22x12+22x22+2]
如果将单位矩阵乘2:y.backward(gradient=torch.tensor([[2.0, 2], [2, 2]]))
import torch
x = torch.tensor([[1.0, 2], [3, 4]], requires_grad=True)
y = x ** 2 + 2 * x + 1
y.backward(gradient=torch.tensor([[2.0, 2], [2, 2]]))
print(x.grad)
>>>tensor([[ 8., 12.],
[16., 20.]])
相当于:
[
y
11
y
12
y
21
y
22
]
=
[
2
(
x
11
2
+
2
x
11
+
1
)
2
(
x
12
2
+
2
x
12
+
1
)
2
(
x
21
2
+
2
x
21
+
1
)
2
(
x
22
2
+
2
x
22
+
1
)
]
\left[ \begin{matrix} y_{11} &y_{12}\\ y_{21}&y_{22}\\ \end{matrix} \right]= \left[ \begin{matrix} 2 (x_{11}^2+2x_{11}+1)&2(x_{12}^2+2x_{12}+1)\\ 2(x_{21}^2+2x_{21}+1)&2(x_{22}^2+2x_{22}+1)\\ \end{matrix} \right]
[y11y21y12y22]=[2(x112+2x11+1)2(x212+2x21+1)2(x122+2x12+1)2(x222+2x22+1)]
所以其导数也将是原值的二倍。
[
∂
y
11
∂
x
11
∂
y
12
∂
x
12
∂
y
21
∂
x
21
∂
y
22
∂
x
22
]
=
[
4
x
11
+
4
4
x
12
+
4
4
x
21
+
4
4
x
22
+
4
]
\left[ \begin{matrix} \frac{\partial y_{11}}{\partial x_{11}} &\frac{\partial y_{12}}{\partial x_{12}}\\ \frac{\partial y_{21}}{\partial x_{21}} &\frac{\partial y_{22}}{\partial x_{22}}\\ \end{matrix} \right]= \left[ \begin{matrix} 4x_{11} + 4&4x_{12} + 4\\ 4x_{21} + 4&4x_{22} + 4\\ \end{matrix} \right]
[∂x11∂y11∂x21∂y21∂x12∂y12∂x22∂y22]=[4x11+44x21+44x12+44x22+4]
分析3
想象力丰富一点,将前面两个案例相结合。
import torch
x = torch.tensor([[1.0, 2], [3, 4]], requires_grad=True)
y = x ** 2 + 2 * x + 1
z = torch.sum(x ** 2 + 2 * x + 1)
y.backward(gradient=torch.tensor([[1.0, 1], [1, 1]]))
z.backward()
print(x.grad)
>>>tensor([[ 8., 12.],
[16., 20.]])
以此对y、z进行反向传播求导,其输出结果是两次求导的和。求导公式应为:
[
∂
y
∂
x
11
+
∂
y
11
∂
x
11
∂
y
∂
x
12
+
∂
y
12
∂
x
12
∂
y
∂
x
21
+
∂
y
21
∂
x
21
∂
y
∂
x
22
+
∂
y
22
∂
x
22
]
=
[
2
x
11
+
2
2
x
12
+
2
2
x
21
+
2
2
x
22
+
2
]
+
[
2
x
11
+
2
2
x
12
+
2
2
x
21
+
2
2
x
22
+
2
]
\left[ \begin{matrix} \frac{\partial y}{\partial x_{11}} + \frac{\partial y_{11}}{\partial x_{11}} &\frac{\partial y}{\partial x_{12}} + \frac{\partial y_{12}}{\partial x_{12}}\\ \frac{\partial y}{\partial x_{21}} + \frac{\partial y_{21}}{\partial x_{21}} &\frac{\partial y}{\partial x_{22}} + \frac{\partial y_{22}}{\partial x_{22}}\\ \end{matrix} \right]= \left[ \begin{matrix} 2x_{11} + 2&2x_{12} + 2\\ 2x_{21} + 2&2x_{22} + 2\\ \end{matrix} \right]+ \left[ \begin{matrix} 2x_{11} + 2&2x_{12} + 2\\ 2x_{21} + 2&2x_{22} + 2\\ \end{matrix} \right]
[∂x11∂y+∂x11∂y11∂x21∂y+∂x21∂y21∂x12∂y+∂x12∂y12∂x22∂y+∂x22∂y22]=[2x11+22x21+22x12+22x22+2]+[2x11+22x21+22x12+22x22+2]