简 介: 对于paddle中的autograd.backward进行测试。并对集中常见到的函数进行测试。
关键词
: gradient
§01 自动求解梯度
关于paddle中的 paddle.autograd.backward函数参见: paddle.autograd.backward 中的内容。
1.1 函数backward
1.1.1 函数使用说明
函数定义: def backward(tensors, grad_tensors=None, retain_graph=False)
计算给定tensor的反向梯度。
(1)参数
Args:
tensors(list of Tensors): the tensors which the gradient to be computed. The tensors can not contain the same tensor.
grad_tensors(list of Tensors of None, optional): the init gradients of the `tensors`` .If not None, it must have the same length with ``tensors`` ,
and if any of the elements is None, then the init gradient is the default value which is filled with 1.0.
If None, all the gradients of the ``tensors`` is the default value which is filled with 1.0.
Defaults to None.
retain_graph(bool, optional): If False, the graph used to compute grads will be freed. If you would
like to add more ops to the built graph after calling this method( :code:`backward` ), set the parameter
:code:`retain_graph` to True, then the grads will be retained. Thus, seting it to False is much more memory-efficient.
Defaults to False.
Returns:
NoneType: None
(2)举例
Examples:
.. code-block:: python
import paddle
x = paddle.to_tensor([[1, 2], [3, 4]], dtype='float32', stop_gradient=False)
y = paddle.to_tensor([[3, 2], [3, 4]], dtype='float32')
grad_tensor1 = paddle.to_tensor([[1,2], [2, 3]], dtype='float32')
grad_tensor2 = paddle.to_tensor([[1,1], [1, 1]], dtype='float32')
z1 = paddle.matmul(x, y)
z2 = paddle.matmul(x, y)
paddle.autograd.backward([z1, z2], [grad_tensor1, grad_tensor2], True)
print(x.grad)
#[[12. 18.]
# [17. 25.]]
x.clear_grad()
paddle.autograd.backward([z1, z2], [grad_tensor1, None], True)
print(x.grad)
#[[12. 18.]
# [17. 25.]]
x.clear_grad()
paddle.autograd.backward([z1, z2])
print(x.grad)
#[[10. 14.]
# [10. 14.]]
1.1.2 实际测试
(1)单向量求解
Ⅰ.定义变量和函数
import sys,os,math,time
import matplotlib.pyplot as plt
from numpy import *
import paddle
from paddle import to_tensor as TT
x = TT([1], dtype='float32', stop_gradient=False)
y = TT([2], dtype='float32')
z = paddle.matmul(x, y)
print("x: {}".format(x),"y: {}".format(y),"z: {}".format(z))
x: Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,
[1.])
y: Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=True,
[2.])
z: Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,
[2.])
Ⅱ.在backward之前
print("x.grad: {}".format(x.grad), "y.grad: {}".format(y.grad), "z.grad: {}".format(z.grad))
x.grad: None
y.grad: None
z.grad: None
Ⅲ.运行backward之后
paddle.autograd.backward(z, TT([3], dtype='float32'))
x.grad: Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,
[6.])
y.grad: None
z.grad: Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,
[3.])
从上面的代码可以看到,
z
=
x
⋅
y
z = x \cdot y
z=x⋅y,所以,
∂
x
=
∂
z
⋅
y
\partial x = \partial z \cdot y
∂x=∂z⋅y。根据
∂
z
=
3
,
y
=
2
\partial z = 3,\,\,y = 2
∂z=3,y=2
,所以
∂
x
=
6
\partial x = 6
∂x=6。
Ⅳ.再次backward
再次计算backward的时候,环境给出错误。
RuntimeError: (Unavailable) auto_0_ trying to backward through the same graph a second time, but this graph have already been freed. Please specify Tensor.backward(retain_graph=True) when calling backward at the first time.
[Hint: Expected var->GradVarBase()->GraphIsFreed() == false, but received var->GradVarBase()->GraphIsFreed():1 != false:0.] (at /paddle/paddle/fluid/imperative/basic_engine.cc:74)
即使调用 clear_grad(),也无法保证重新计算backward()
x.clear_grad()
y.clear_grad()
z.clear_grad()
将autograd.backward函数中的retain_graph设置为TRUE。
paddle.autograd.backward(z, TT([3], dtype='float32'), retain_graph=True)
此时就可以重复调用backward。
第一次调用:
print("x.grad: {}".format(x.grad), "y.grad: {}".format(y.grad), "z.grad: {}".format(z.grad))
x.grad: Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,
[6.])
y.grad: None
z.grad: Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,
[3.])
第二次调用则:
x.grad: Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,
[12.])
y.grad: None
z.grad: Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,
[3.])
(2)矩阵乘法
Ⅰ.一个矩阵乘法
- 定义矩阵相乘:
z = x ⋅ y z = x \cdot y z=x⋅y
- 反向梯度:
∂ x = ∂ z ⋅ y \partial x = \partial z \cdot y ∂x=∂z⋅y
x = TT([[1,2],[3,4]], dtype='float32', stop_gradient=False)
y = TT([[3,2],[3,4]], dtype='float32')
z = paddle.matmul(x, y)
print("x: {}".format(x),"y: {}".format(y),"z: {}".format(z))
paddle.autograd.backward(z, TT([[1,2],[2,3]], dtype='float32'), retain_graph=True)
print("x.grad: {}".format(x.grad), "y.grad: {}".format(y.grad), "z.grad: {}".format(z.grad))
x: Tensor(shape=[2, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
[[1., 2.],
[3., 4.]])
y: Tensor(shape=[2, 2], dtype=float32, place=CPUPlace, stop_gradient=True,
[[3., 2.],
[3., 4.]])
z: Tensor(shape=[2, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
[[9. , 10.],
[21., 22.]])
x.grad: Tensor(shape=[2, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
[[7. , 11.],
[12., 18.]])
y.grad: None
z.grad: Tensor(shape=[2, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
[[1., 2.],
[2., 3.]])
从中可以看到,对于矩阵的自动反向梯度,所遵循的与标量基本是一样的。
print("y*z.grad: {}".format(y.numpy().dot(z.grad.numpy())))
y*z.grad: [[ 7. 12.]
[11. 18.]]
Ⅱ.两个矩阵乘法
定义两个矩阵运算:
z 1 = x ⋅ y , z 1 = x ⋅ y z_1 = x \cdot y,\,\,z_1 = x \cdot y z1=x⋅y,z1=x⋅y
那么梯度:
∂ x = ( ∂ z 1 + ∂ z 2 ) ⋅ y \partial x = \left( {\partial z_1 + \partial z_2 } \right) \cdot y ∂x=(∂z1+∂z2)⋅y
x = TT([[1,2],[3,4]], dtype='float32', stop_gradient=False)
y = TT([[3,2],[3,4]], dtype='float32')
z1 = paddle.matmul(x, y)
z2 = paddle.matmul(x, y)
print("x: {}".format(x),"y: {}".format(y),"z: {}".format(z))
paddle.autograd.backward(z1, TT([[1,2],[2,3]], dtype='float32'))
paddle.autograd.backward(z2, TT([[1,1],[1,1]], dtype='float32'))
print("x.grad: {}".format(x.grad), "y.grad: {}".format(y.grad), "z.grad: {}".format(z.grad))
x: Tensor(shape=[2, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
[[1., 2.],
[3., 4.]])
y: Tensor(shape=[2, 2], dtype=float32, place=CPUPlace, stop_gradient=True,
[[3., 2.],
[3., 4.]])
z: Tensor(shape=[2, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
[[9. , 10.],
[21., 22.]])
x.grad: Tensor(shape=[2, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
[[12., 18.],
[17., 25.]])
y.grad: None
z.grad: Tensor(shape=[2, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
[[1., 2.],
[2., 3.]])
检验x的梯度矩阵:
print("y*(z1+z2).grad: {}".format(y.numpy().dot((z1.grad+z2.grad).numpy())))
y*(z1+z2).grad: [[12. 17.]
[18. 25.]]
它等于计算公式给出的数值。
1.2 更多函数例子
1.2.1 square
x = TT([1,2], dtype='float32', stop_gradient=False)
z1 = paddle.square(x)
print("x: {}".format(x),"z1: {}".format(z1))
paddle.autograd.backward(z1)
print("x.grad: {}".format(x.grad), "z1.grad: {}".format(z1.grad))
x: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
[1., 2.])
z1: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
[1., 4.])
x.grad: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
[2., 4.])
z1.grad: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
[1., 1.])
1.2.2 exp
x = TT([1,2], dtype='float32', stop_gradient=False)
z1 = paddle.exp(x)
print("x: {}".format(x),"z1: {}".format(z1))
paddle.autograd.backward(z1)
print("x.grad: {}".format(x.grad), "z1.grad: {}".format(z1.grad))
x: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
[1., 2.])
z1: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
[2.71828175, 7.38905621])
x.grad: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
[2.71828175, 7.38905621])
z1.grad: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
[1., 1.])
1.2.3 log
x: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
[1., 2.])
z1: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
[0. , 0.69314718])
x.grad: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
[1. , 0.50000000])
z1.grad: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
[1., 1.])
1.2.4 Softmax
(1)理论推导
为了方便起见,这里使用两个变量的SoftMax:
z 1 = e x 1 e x 1 + e x 2 , z 2 = e x 2 e x 1 + e x 2 z_1 = {{e^{x_1 } } \over {e^{x_1 } + e^{x_2 } }},\,\,\,\,z_2 = {{e^{x_2 } } \over {e^{x_1 } + e^{x_2 } }} z1=ex1+ex2ex1,z2=ex1+ex2ex2
那么:
∂
x
1
=
∂
z
1
⋅
e
x
1
⋅
(
e
x
1
+
e
x
2
)
−
e
x
1
⋅
e
x
1
(
e
x
1
+
e
x
2
)
2
+
∂
z
2
⋅
−
e
x
2
⋅
e
x
1
(
e
x
1
+
e
x
2
)
2
\partial x_1 = \partial z_1 \cdot {{e^{x_1 } \cdot \left( {e^{x_1 } + e^{x_2 } } \right) - e^{x_1 } \cdot e^{x_1 } } \over {\left( {e^{x_1 } + e^{x_2 } } \right)^2 }} + \partial z_2 \cdot {{ - e^{x_2 } \cdot e^{x_1 } } \over {\left( {e^{x_1 } + e^{x_2 } } \right)^2 }}
∂x1=∂z1⋅(ex1+ex2)2ex1⋅(ex1+ex2)−ex1⋅ex1+∂z2⋅(ex1+ex2)2−ex2⋅ex1
同理,也可以得到 ∂ x 2 \partial x_2 ∂x2。这里省略。
如果 ∂ z 1 = ∂ z 2 \partial z_1 = \partial z_2 ∂z1=∂z2,那么就会有:。 ∂ x 1 = ∂ x 2 = 0 \partial x_1 = \partial x_2 = 0 ∂x1=∂x2=0。
(2)实验仿真
Ⅰ.backward对z梯度不尽兴初始化
x = TT([1,2], dtype='float32', stop_gradient=False)
z1 = paddle.exp(x) / paddle.sum(paddle.exp(x))
print("x: {}".format(x),"z1: {}".format(z1))
paddle.autograd.backward(z1)
print("x.grad: {}".format(x.grad), "z1.grad: {}".format(z1.grad))
x: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
[1., 2.])
z1: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
[0.26894140, 0.73105860])
x.grad: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
[0., 0.])
z1.grad: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
[1., 1.])
Ⅱ.对z的梯度进行初始化
x = TT([1,2], dtype='float32', stop_gradient=False)
z1 = paddle.exp(x) / paddle.sum(paddle.exp(x))
print("x: {}".format(x),"z1: {}".format(z1))
paddle.autograd.backward(z1,TT([1,2], dtype='float32'))
print("x.grad: {}".format(x.grad), "z1.grad: {}".format(z1.grad))
x: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
[1., 2.])
z1: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
[0.26894140, 0.73105860])
x.grad: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
[-0.19661194, 0.19661200])
z1.grad: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
[1., 2.])
直接根据前面推导的公式,可以验证结果是正确的。
x = array([1,2])
a = exp(sum(x))/(sum(exp(x)))**2
print("a: {}".format(a))
a: 0.19661193324148188
※ 总 结 ※
对于paddle中的autograd.backward进行测试。并对集中常见到的函数进行测试。
■ 相关文献链接: