转载知乎上的 pytorch autograd 中vector-Jacobian product的解释

最新推荐文章于 2023-03-08 14:51:49 发布

gstc123

最新推荐文章于 2023-03-08 14:51:49 发布

阅读量796

点赞数

原文链接：http://www.baidu.com/link?url=Da-VPq8C-hQjHFTjBDoYtv41HD-uTNcx5xNS4BLGAQKCkLnYWGA0A2_iFnEeTYGg&wd=&eqid=ead6dfba000092e3000000065df8c634

版权

先写一下自己的理解，其实就是说y=f(x),这里的y和x都是向量哈

y对于x的导数J其实是一个雅克比矩阵，而pytorch里其实求的是vector-Jacobian product

即 $J^{^{T}}*v=\begin{bmatrix} \frac{y1}{x} &\frac{y2}{x} & \frac{y3}{x} \end{bmatrix}*\begin{bmatrix} v1\\v2 \\ v3 \end{bmatrix}$ ,其中 $\frac{y1}{x}$ 是列向量

详解Pytorch 自动微分里的（vector-Jacobian product）

mathmad

数学，人工智能，经典文学

AUTOGRAD 是Pytorch的重型武器之一，理解它的核心关键在于理解vector-Jacobian product

以三维向量值函数为例：

$X = [x_1,x_2,x_3] \ Y = X^2$

按Tensor, Element-Wise机制运算，但实际上表示的是:

Y=[y_1=x_1^2, y_2=x_2^2,y_3=x_3^2]

对的导数不是而是一个 Jacobian 矩阵(因为 X,Y 是向量，不是一维实数): $J = \left ( \begin{array}{c} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \frac{\partial y_1}{\partial x_3} \\ \frac{\partial y_2}{\partial x_1} &\frac{\partial y_2}{\partial x_2} &\frac{\partial y_2}{\partial x_3} \\ \frac{\partial y_3}{\partial x_1} &\frac{\partial y_3}{\partial x_2} &\frac{\partial y_3}{\partial x_3} \end{array} \right ) = \left ( \begin{array}{c} 2x_1 & 0 & 0 \\0 & 2x_2 & 0 \\0 & 0 & 2x_3 \end{array} \right )$

其中 y_1=f_1(x_1,x_2,x_3)=x_1^2 ，它是关于的函数，而不仅仅只是关于 x_1 ，这儿的

特殊性是由Element-Wise运算机制引起的，同理 $y_2,\quad y_3$ 。

而 d(Y) 对每一个分量 x_i 的导数（变化率）是，各个分量函数 $y_j, \quad j = 1, 2,3$ 对 x_i 的偏导数

沿某一方向的累积，一般的，的默认方向是 v = (1, 1, 1) 。

当然，您也可以传入不同的方向进去，就是官方文档声称的可easy feed external gradient。

这儿，

我们可以将其理解为：关于 x_i 的偏导数向量在方向上的投影；

也可以将其理解为：各个分量函数关于 x_i 偏导的权重。

一旦确定，关于每个 x_i 的权重就都确定了，而且是一样的。

实验一下:

一 ---简单的隐式Jacobian

>>> x = torch.randn(3, requires_grad = True)
>>> x
tensor([-0.9238,  0.4353, -1.3626], requires_grad=True)
>>> y = x**2
>>> y.backward(torch.ones(3))
>>> x.grad
tensor([-1.8476,  0.8706, -2.7252])
>>> x
tensor([-0.9238,  0.4353, -1.3626], requires_grad=True)
>>>

二 ---简单的显示Jacobian验证

>>> x1=torch.tensor(1, requires_grad=True, dtype = torch.float)
>>> x2=torch.tensor(2, requires_grad=True, dtype = torch.float)
>>> x3=torch.tensor(3, requires_grad=True, dtype = torch.float)
>>> y=torch.randn(3) # produce a random vector for vector function define
>>> y[0]=x1**2+2*x2+x3 # define each vector function
>>> y[1]=x1+x2**3+x3**2
>>> y[2]=2*x1+x2**2+x3**3
>>> y.backward(torch.ones(3))
>>> x1.grad
tensor(5.)
>>> x2.grad
tensor(18.)
>>> x3.grad
tensor(34.)

上面代码中 Jacobian 矩阵为: $J = \left ( \begin{array}{c} 2x_1 & 2 & 1 \\ 1 & 3x_2^2 & 2x_3\\ 2 & 2x_2 & 3x_3^2 \end{array} \right )$

各分量函数为分别为: $\begin{cases} y_1=x_1^2+2x_2+x_3 \\ y_2=x_1+x_2^3+x_3^2 \\ y_3=2x_1+x_2^2+x_3^3 \end{cases}$

投影方向： v=(1,1,1)

$v \circ J=[2x_1+1+2, 2+3x_2^2+2x_2,1+2x_3+3x_3^2]=[5,18,34]$

代码结果与分析相互印证

三---投影到不同的方向

先分析:

$v \circ J=[3*2x_1+2*1+1*2, 3*2+2*3x_2^2+1*2x_2,3*1+2*2x_3+1*3x_3^2]=[10,34,42]$

再代码验证:

>>> x1=torch.tensor(1, requires_grad=True, dtype = torch.float)
>>> x2=torch.tensor(2, requires_grad=True, dtype = torch.float)
>>> x3=torch.tensor(3, requires_grad=True, dtype = torch.float)
>>> y=torch.randn(3)
>>> y[0]=x1**2+2*x2+x3
>>> y[1]=x1+x2**3+x3**2
>>> y[2]=2*x1+x2**2+x3**3
>>> v=torch.tensor([3,2,1],dtype=torch.float)
>>> y.backward(v)
>>> x1.grad
tensor(10.)
>>> x2.grad
tensor(34.)
>>> x3.grad
tensor(42.)

吻合！

总结

既然是权重或向量函数的投影方向，它的大小就必须与向量函数的个数对应
如果最后的函数值是标量，则说明“向量函数”只有一个，可以不传值，默认为

参考

gstc123

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
转载知乎上的 pytorch autograd 中vector-Jacobian product的解释

先写一下自己的理解，其实就是说y=f(x),这里的y和x都是向量哈y对于x的导数J其实是一个雅克比矩阵，而pytorch里其实求的是vector-Jacobian product即 ,其中是列向量详解Pytorch 自动微分里的（vector-Jacobian product）mathmad数学，人工智能，经典文学是Pytorch的重型武器之一，理解它的核心关键在于理...
复制链接

扫一扫