文章目录
梯度下降
简介
梯度
梯度是一个向量,表示某一函数在该点处的方向导数沿着该方向取得最大值。
g r a d f ( x , y ) = ∇ f ( x , y ) = ( ∂ f ∂ x , ∂ f ∂ y ) = ∂ f ∂ x i + ∂ f ∂ x j gradf(x,y) = \nabla{f(x,y)} = (\frac{\partial f}{\partial x},\frac{\partial f}{\partial y}) = \frac{\partial f}{\partial x}i + \frac{\partial f}{\partial x}j gradf(x,y)=∇f(x,y)=(∂x∂f,∂y∂f)=∂x∂fi+∂x∂fj
利用梯度优化
梯度方向是函数增长最快的方向,因此搜索函数最小值的过程就是不断向负梯度方向移动的过程
θ t + 1 = θ t − α t ∇ f ( θ t ) \theta_{t+1}= \theta_t - \alpha_t\nabla{f(\theta_t)} θt+1=θt−αt∇f(θt)
AutoGrad with Tensorflow
GradientTape
- with tf.GradientTape() as tape:
- Build computation graph
- l o s s = f θ ( x ) loss = f_\theta(x) loss=fθ(x)
- [w_grad] = tape.gradient(loss,[w])
w = tf.constant(1.)
b = tf.constant(2.)
x = tf.constant(3.)
y = w*x
with tf.GradientTape() as tape:
tape.watch([w])
y2 = w * x
grad1 = tape.gradient(y, [w])
print(grad1)#[None]
grad2 = tape.gradient(y2, [w])#non-persistent error
with tf.GradientTape() as tape:
tape.watch([w])
y2 = w * x
grad2 = tape.gradient(y2, [w])
print(grad2)#2
Persistent GradientTape
non-persistent 只能调用一次,用完就会释放显存,可以开启persistent选项来解决这个问题,用完以后记得手动释放