自动求梯度
在深度学习中,我们经常需要对函数求梯度(gradient)。在tensorflow2.0中可以使用GradientTape来自动求梯度。
1.简单示例
简单例子:对函数 y = 2 x T x y = 2x^Tx y=2xTx求关于列向量 x x x的梯度。
x = tf.reshape(tf.Variable(range(4), dtype=tf.float32), (4, 1))
x
输出结果:
<tf.Tensor: id=10, shape=(4, 1), dtype=float32, numpy=
array([[0.],
[1.],
[2.],
[3.]], dtype=float32)>
函数
y
=
2
x
T
x
y = 2x^Tx
y=2xTx关于
x
x
x的梯度应该为
4
x
4x
4x。
tensorflow中实现求梯度:
with tf.GradientTape() as t:
t.watch(x)
y = 2 * tf.matmul(tf.transpose(x), x)
dy_dx = t.gradient(y, x)
dy_dx
输出结果:
<tf.Tensor: id=30, shape=(4, 1), dtype=float32, numpy=
array([[ 0.],
[ 4.],
[ 8.],
[12.]], dtype=float32)>
2.训练模式和预测模式
with tf.GradientTape(persistent=True) as g:
g.watch(x)
y = x * x
z = y * y
dz_dx = g.gradient(z, x)
dy_dx = g.gradient(y, x)
dz_dx, dy_dz
输出结果:
WARNING:tensorflow:Calling GradientTape.gradient on a persistent tape inside its context is significantly less efficient than calling it outside the context (it causes the gradient ops to be recorded on the tape, leading to increased CPU and memory usage). Only call GradientTape.gradient inside the context if you actually want to trace the gradient in order to compute higher order derivatives.
WARNING:tensorflow:Calling GradientTape.gradient on a persistent tape inside its context is significantly less efficient than calling it outside the context (it causes the gradient ops to be recorded on the tape, leading to increased CPU and memory usage). Only call GradientTape.gradient inside the context if you actually want to trace the gradient in order to compute higher order derivatives.
(<tf.Tensor: id=41, shape=(4, 1), dtype=float32, numpy=
array([[ 0.],
[ 4.],
[ 32.],
[108.]], dtype=float32)>,
<tf.Tensor: id=47, shape=(4, 1), dtype=float32, numpy=
array([[0.],
[2.],
[4.],
[6.]], dtype=float32)>)
3.对python控制流求梯度
即使函数的计算图包含了Python的控制流(如条件和循环控制),我们也有可能对变量求梯度。
这里我也没看懂,就不放代码了。需要的同学请点击原文链接查看