因为,tensorflow 2.0 正式版将要发布,为了将之前TF1.X的代码,能在更新后兼容,开始着手使用 TF 2.0 的方式构建并训练模型;因为 tensorflow 2.0 正式版尚未正式发布,但因为 2.0 中的很多功能,是基于 TF1.14 中的 v2 模块进行完善的,而且 1.14 版本已经非常稳定,所以,使用 TF 1.14 来完成从TF 1.X到 TF 2.0 的过渡。但因为 TF 2.0 默认是 Earge 模式运行,且训练方式及API与 1.X 版本中做出了很大的改动,又因为 TF 2.0 的相关资料,目前而言,相对较少,本文则只是对本人学习过程的记录。
注:本文只是对本人学习过程中的示例进行记录,若有不足之处,请各位大神不吝赐教。
本文示例代码,以《深度学习之TensorFlow 入门、原理与进阶实战》(李金洪) 一书中,第三章的逻辑回归拟合二维数据中的示例代码为例,以Earge模式,在 TF 1.14 v2 模块下,对模型进行重新构建;
原书中的代码:
import tensorflow.compat.v1 as tf
import numpy as np
import matplotlib.pyplot as plt
train_X = np.linspace(-1, 1, 100)
[data_len] = train_X.shape
train_Y = 2 * train_X + np.random.randn(data_len) * 0.3 # y = 2x,但是加入了噪声
# 创建点位符
X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)
# 模型参数
W = tf.Variable(tf.random.normal([1]), name='weight')
b = tf.Variable(tf.zeros([1]), name='bias')
# 前向结构
z = tf.multiply(X, W) + b
# 反向优化
cost = tf.reduce_mean(tf.square(Y - z))
learning_rate = 0.01
# optimizer = t
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
# 初始化所有变量
init = tf.global_variables_initializer()
# 定义参数
training_epochs = 20
display_step = 2
# 存放批次值和损失值
plot_data = {'batchsize': [], 'loss': []}
def moving_average(a, w=10):
if len(a) < w:
return a[:]
return [val if idx < w else sum(a[(idx-w): idx]) / w for idx, val in enumerate(a)]
# 启动 session
with tf.Session() as sess:
sess.run(init)
# 向模型输入数据
for epoch in range(training_epochs):
for (x, y) in zip(train_X, train_Y):
sess.run(optimizer, feed_dict={X: x, Y: y})
# 显示训练中的详细信息
if epoch % display_step == 0:
loss = sess.run(cost, feed_dict={X: train_X, Y: train_Y})
print('Epoch:', epoch + 1, "cost=", loss, 'W=', sess.run(W), 'b=', sess.run(b))
if not loss == 'NA':
plot_data['batchsize'].append(epoch)
plot_data['loss'].append(loss)
print('Finished')
# 显示模拟数据点
plt.plot(train_X, train_Y, 'ro', label='Original data')
plt.legend()
plt.show()
plt.plot(train_X, train_Y, 'ro', label='Original data')
plt.plot(train_X, sess.run(W) * train_X + sess.run(b), label='Fittedline')
plt.legend()
plt.show()
plot_data['avgloss'] = moving_average(plot_data['loss'])
plt.figure(1)
plt.subplot(211)
plt.plot(plot_data['batchsize'], plot_data['avgloss'], 'b--')
plt.xlabel('Minibatch number')
plt.ylabel('Loss')
plt.title('Minbatch run vs. Training loss')
plt.show()
以 TF 1.14 v2 中的 Earge 模式,重新构建后的代码:
import tensorflow.compat.v2 as tf
import numpy as np
import matplotlib.pyplot as plt
tf.enable_v2_behavior()
# 创建将要进行训练的数据
train_X = np.linspace(-1, 1, 100)
train_Y = 2 * train_X + np.random.randn(*train_X.shape) * 0.3 # y = 2x,但是加入了噪声
w = tf.Variable(tf.random.normal([1]), dtype=tf.float32)
b = tf.Variable(tf.zeros([1]), dtype=tf.float32)
# 定义前向结构
@tf.function
def forward(x):
x = tf.cast(x, tf.float32)
return x * w + b
opt = tf.optimizers.SGD(0.01)
def run_opt(x, y):
x = tf.cast(x, tf.float32)
y = tf.cast(y, tf.float32)
with tf.GradientTape() as g:
loss = tf.reduce_mean(tf.square(forward(x) - y))
# 把要训练的数据整合到一起
train_data = (w, b)
# 计算梯度
gradients = g.gradient(loss, train_data)
# 更新训练值,该模型中为 w 和 b
opt.apply_gradients(zip(gradients, train_data))
return loss
# 得益于tf2.0的动态图特性,可以在函数中直接循环训练
def lr():
losses = []
weights = []
biases = []
for i in range(20):
sum_loss = 0
cnt = 0
for (x, y) in zip(train_X, train_Y):
loss = run_opt(x, y)
sum_loss += loss
cnt = cnt + 1
loss = sum_loss / cnt
print('Epoch: ', i, 'loss:', loss.numpy())
losses.append(loss)
weights.append(w.numpy())
biases.append(b.numpy())
print('w=', w.numpy(), 'b=', b.numpy())
return w, losses
weight, ls = lr()
plt.plot(ls)
plt.show()
plt.plot(train_X, train_Y, 'ro', label='Original data')
plt.plot(train_X, forward(train_X), label='Fittedline')
plt.legend()
plt.show()
print('W:', weight.numpy(), 'b=', b.numpy())
z = forward(0.2)
print('w * x + b = ', z.numpy())
在 Earge模式下,可以直接在循环或函数中,进行优化,不需要像TF 1.X 中,在 Session中执行。在 Earge 模式下,训练模型需要注意以下几点:
1:在构建优化模型时,将所有需要进行训练的数据进行整合,然后进行梯度计算,如下图所示
2:在 minimize()中,调用了 apply_gradients 函数,因此,不必再像 TF1.X中那样,将优化模型,显式地设置为 minize()
TF1.X 中的优化方式
# 反向优化
cost = tf.reduce_mean(tf.square(Y - z))
learning_rate = 0.01
# optimizer = t
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
TF 1.14 Earge 模式下的优化方式:
def run_opt(x, y):
x = tf.cast(x, tf.float32)
y = tf.cast(y, tf.float32)
with tf.GradientTape() as g:
loss = tf.reduce_mean(tf.square(forward(x) - y))
# 把要训练的数据整合到一起
train_data = (w, b)
# 计算梯度
gradients = g.gradient(loss, train_data)
# 更新训练值,该模型中为 w 和 b
opt.apply_gradients(zip(gradients, train_data))
return loss
def minimize(self, loss, var_list, grad_loss=None, name=None):
"""Minimize `loss` by updating `var_list`.
This method simply computes gradient using `tf.GradientTape` and calls
`apply_gradients()`. If you want to process the gradient before applying
then call `tf.GradientTape` and `apply_gradients()` explicitly instead
of using this function.
Args:
loss: A callable taking no arguments which returns the value to minimize.
var_list: list or tuple of `Variable` objects to update to minimize
`loss`, or a callable returning the list or tuple of `Variable` objects.
Use callable when the variable list would otherwise be incomplete before
`minimize` since the variables are created at the first time `loss` is
called.
grad_loss: Optional. A `Tensor` holding the gradient computed for `loss`.
name: Optional name for the returned operation.
Returns:
An Operation that updates the variables in `var_list`. If `global_step`
was not `None`, that operation also increments `global_step`.
Raises:
ValueError: If some of the variables are not `Variable` objects.
"""
grads_and_vars = self._compute_gradients(
loss, var_list=var_list, grad_loss=grad_loss)
return self.apply_gradients(grads_and_vars, name=name)
def apply_gradients(self, grads_and_vars, name=None):
"""Apply gradients to variables.
This is the second part of `minimize()`. It returns an `Operation` that
applies gradients.
Args:
grads_and_vars: List of (gradient, variable) pairs.
name: Optional name for the returned operation. Default to the name
passed to the `Optimizer` constructor.
Returns:
An `Operation` that applies the specified gradients. If `global_step`
was not None, that operation also increments `global_step`.
Raises:
TypeError: If `grads_and_vars` is malformed.
ValueError: If none of the variables have gradients.
"""
grads_and_vars = _filter_grads(grads_and_vars)
var_list = [v for (_, v) in grads_and_vars]
with backend.name_scope(self._scope_ctx):
# Create iteration if necessary.
with ops.init_scope():
_ = self.iterations
self._create_hypers()
self._create_slots(var_list)
self._prepare(var_list)
return distribute_ctx.get_replica_context().merge_call(
self._distributed_apply,
args=(grads_and_vars,),
kwargs={"name": name})