Tensorflow2基础：自动微分机制(tf.GradientTape)

Code_LT

已于 2022-09-05 21:32:22 修改

阅读量1k

点赞数 1

分类专栏： TensorFlow 文章标签： tensorflow python 深度学习

于 2022-08-31 22:10:11 首次发布

本文链接：https://blog.csdn.net/Code_LT/article/details/126610791

版权

TensorFlow 专栏收录该内容

3 篇文章 2 订阅

订阅专栏

神经网络通常依赖反向传播求梯度来更新网络参数，求梯度过程通常是一件非常复杂而容易出错的事情。

而深度学习框架可以帮助我们自动地完成这种求梯度运算。

Tensorflow一般使用梯度磁带tf.GradientTape来记录正向运算过程，然后反播磁带自动得到梯度值。

这种利用tf.GradientTape求微分的方法叫做Tensorflow的自动微分机制。

预备知识:with的使用

Python 中的 with 语句用于异常处理，封装了 try…except…finally 编码范式，提高了易用性。

with func() as a：
	do some thing...

可理解为做了：

a=func() 
try{	
	do some thing...
	}
except{
	做了些异常处理
	}
finally {
	做了些收尾工作
	}

详见：菜鸟课程with使用

其次，通常而言，在编程语言中，变量的作用域从代码结构形式来看，有块级、函数、类、模块、包等由小到大的级别。但是在Python中，没有块级作用域，也就是类似if语句块、for语句块、with上下文管理器等等是不存在作用域概念的。所以，with里面定义的变量，可以再with外部使用。

详见：python作用域

一，利用梯度磁带求导数

1.1 可对变量x求导

import tensorflow as tf
import numpy as np 

# f(x) = a*x**2 + b*x + c的导数

x = tf.Variable(0.0,name = "x",dtype = tf.float32)
a = tf.constant(1.0)
b = tf.constant(-2.0)
c = tf.constant(1.0)

with tf.GradientTape() as tape:
    y = a*tf.pow(x,2) + b*x + c
    
dy_dx = tape.gradient(y,x)
print(dy_dx)
# 输出：
tf.Tensor(-2.0, shape=(), dtype=float32)

1.2 对常量求导

# 对常量张量也可以求导，需要增加watch

with tf.GradientTape() as tape:
    tape.watch([a,b,c])
    y = a*tf.pow(x,2) + b*x + c
    
dy_dx,dy_da,dy_db,dy_dc = tape.gradient(y,[x,a,b,c])
print(dy_da)
print(dy_dc)
#输出：
tf.Tensor(0.0, shape=(), dtype=float32)
tf.Tensor(1.0, shape=(), dtype=float32)

1.3 二阶导数

# 可以求二阶导数
with tf.GradientTape() as tape2:
    with tf.GradientTape() as tape1:   
        y = a*tf.pow(x,2) + b*x + c
    dy_dx = tape1.gradient(y,x)   
dy2_dx2 = tape2.gradient(dy_dx,x)

print(dy2_dx2)
#输出：
tf.Tensor(2.0, shape=(), dtype=float32)

1.4 在autograph中使用

@tf.function
def f(x):   
    a = tf.constant(1.0)
    b = tf.constant(-2.0)
    c = tf.constant(1.0)
    
    # 自变量转换成tf.float32
    x = tf.cast(x,tf.float32)
    with tf.GradientTape() as tape:
        tape.watch(x)
        y = a*tf.pow(x,2)+b*x+c
    dy_dx = tape.gradient(y,x) 
    
    return((dy_dx,y))

tf.print(f(tf.constant(0.0)))
tf.print(f(tf.constant(1.0)))
(-2, 1)
(0, 0)

二，利用梯度磁带和优化器求最小值

2.1 使用optimizer.apply_gradients

# 求f(x) = a*x**2 + b*x + c的最小值
# 使用optimizer.apply_gradients
import tensorflow as tf
x = tf.Variable(0.0,name = "x",dtype = tf.float32)
a = tf.constant(1.0)
b = tf.constant(-2.0)
c = tf.constant(1.0)

lr=0.01

optimizer = tf.keras.optimizers.SGD(learning_rate=lr)
for step in range(200):
    with tf.GradientTape() as tape:
        y = a*tf.pow(x,2) + b*x + c
    dy_dx = tape.gradient(y,x)
    if step % 20 == 0:
    	tf.print("step =",step,"; y =",y,"; x =",x)
    optimizer.apply_gradients(grads_and_vars=[(dy_dx,x)])#optimizer.apply_gradients(zip(dy_dx,x)) #高维要用zip
    #x.assign_sub(lr*dy_dx)#结果等价
    
tf.print("y =",y,"; x =",x)
#输出：
step = 0 ; y = 1 ; x = 0
step = 20 ; y = 0.445700467 ; x = 0.332391977
step = 40 ; y = 0.19864893 ; x = 0.554299474
step = 60 ; y = 0.0885379314 ; x = 0.702446759
step = 80 ; y = 0.0394613743 ; x = 0.80135107
step = 100 ; y = 0.0175879598 ; x = 0.867380381
step = 120 ; y = 0.00783896446 ; x = 0.911462188
step = 140 ; y = 0.00349384546 ; x = 0.940891385
step = 160 ; y = 0.00155723095 ; x = 0.960538507
step = 180 ; y = 0.000694036484 ; x = 0.973655164
y = 0.0003221035 ; x = 0.98241204

optimizer.apply_gradients做了什么？

如果我们想在模型更新前对梯度搞一些自定义的操作，TensorFlow中推荐的方式是

通过compute_gradients计算梯度（2.x以后optimizers似乎不再对外不暴露compute_gradients，而需用tape）
对梯度进行一些自定义操作
通过apply_gradients将处理后的梯度更新到模型权重（如上示例中可简单认为用learning_rate和dy_dx 更新了x，等价于x.assign_sub(lr*dy_dx)）

第1点，用optimizers.compute_gradients的话，会报错：

       return super(OptimizerV2, self).__getattribute__(name)
AttributeError: 'SGD' object has no attribute 'compute_gradients'

optimizer参考1
optimizer参考2-官网

更新模型参数的方法 optimizer.apply_gradients() 需要提供参数 grads_and_vars，即待更新的变量（如上述代码中的 variables ）及损失函数关于这些变量的偏导数（如上述代码中的 grads ）。具体而言，这里需要传入一个 Python 列表（List），列表中的每个元素是一个 （变量的偏导数，变量） 对。如果上例中b也是变量，则需要传入的参数是 [(grad_x, x), (grad_b, b)] 。我们通过grads = tape.gradient(loss, variables)求出 tape 中记录的 loss 关于variables = [x, b]中每个变量的偏导数，也就是 grads = [grad_x, grad_b]，再使用 Python 的 zip() 函数将 grads = [grad_x, grad_b] 和 variables = [x, b] 拼装在一起，就可以组合出所需的参数了。

import tensorflow as tf
import numpy as np

X_raw = np.array([2013, 2014, 2015, 2016, 2017], dtype=np.float32)
y_raw = np.array([12000, 14000, 15000, 16500, 17500], dtype=np.float32)

X = tf.constant(X)
y = tf.constant(y)

a = tf.Variable(initial_value=0.)
b = tf.Variable(initial_value=0.)
variables = [a, b]

num_epoch = 10000
optimizer = tf.keras.optimizers.SGD(learning_rate=5e-4) #优化器可以帮助我们根据计算出的求导结果更新模型参数，从而最小化某个特定的损失函数
for e in range(num_epoch):
    # 使用tf.GradientTape()记录损失函数的梯度信息
    with tf.GradientTape() as tape:
        y_pred = a * X + b
        loss = tf.reduce_sum(tf.square(y_pred - y))
    # TensorFlow自动计算损失函数关于自变量（模型参数）的梯度
    grads = tape.gradient(loss, variables)
    # TensorFlow自动根据梯度更新参数
    optimizer.apply_gradients(grads_and_vars=zip(grads, variables))

2.2 使用optimizer.minimize相当于先用tape求gradient,再apply_gradient

# 求f(x) = a*x**2 + b*x + c的最小值
# 使用optimizer.minimize
# optimizer.minimize相当于先用tape求gradient,再apply_gradient
x = tf.Variable(0.0,name = "x",dtype = tf.float32)

#注意f()无参数，且返回的是带variable的闭包
def f():   
    a = tf.constant(1.0)
    b = tf.constant(-2.0)
    c = tf.constant(1.0)
    y = a*tf.pow(x,2)+b*x+c
    return(y)

optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)   
for _ in range(1000):
    optimizer.minimize(f,[x])   
    
tf.print("y =",f(),"; x =",x)
y = 0 ; x = 0.999998569

optimizer.minimize内部实际进行了两步：

optimizer.compute_gradients：TensorFlow使用了两种计算梯度的方式，分别是针对静态图的tf.gradients接口和针对动态图的tf.GradientTape接口，这两个接口内部分别使用了符号微分和自动微分的方式来计算梯度。如果传入的loss是一个可调用（callable）对象，那么就会调用backprop.GradientTape相关的接口去求解梯度；否则，就会调用gradients.gradients接口去求解梯度。
optimizer.apply_gradients

2.3 autograph中

# 在autograph中完成最小值求解
# 使用optimizer.apply_gradients
x = tf.Variable(0.0,name = "x",dtype = tf.float32)
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)

@tf.function
def minimizef():
    a = tf.constant(1.0)
    b = tf.constant(-2.0)
    c = tf.constant(1.0)
    
    for _ in tf.range(1000): #注意autograph时使用tf.range(1000)而不是range(1000)
        with tf.GradientTape() as tape:
            y = a*tf.pow(x,2) + b*x + c
        dy_dx = tape.gradient(y,x)
        optimizer.apply_gradients(grads_and_vars=[(dy_dx,x)])
        
    y = a*tf.pow(x,2) + b*x + c
    return y

tf.print(minimizef())
tf.print(x)
0
0.999998569

2.4 在autograph中使用optimizer.minimize

# 在autograph中完成最小值求解
# 使用optimizer.minimize

x = tf.Variable(0.0,name = "x",dtype = tf.float32)
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)   

@tf.function
def f():   
    a = tf.constant(1.0)
    b = tf.constant(-2.0)
    c = tf.constant(1.0)
    y = a*tf.pow(x,2)+b*x+c
    return(y)

@tf.function
def train(epoch):  
    for _ in tf.range(epoch):  
        optimizer.minimize(f,[x])
    return(f())


tf.print(train(1000))
tf.print(x)
0
0.999998569