[最新] [翻译] [TensorFlow 2.1] 将 TensorFlow 1 的代码迁移到 TensorFlow 2

原  文:Migrate you TensorFlow 1 code to Tensorflow 2
译  者:Xovee
译者主页:https://www.xovee.cn
文字许可:Creative Commons Attribution 4.0 License
代码许可:Apache 2.0 License
翻译时间:2020年3月31日
适用版本:TensorFlow 2.1

文章目录

将 TensorFlow 1 的代码迁移到 TensorFlow 2

本文的目标读者是那些使用低阶 TensorFlow API 的人。如果你使用高阶 API(tf.keras),那么将你的代码转换为完全支持 TensorFlow 2.0 的代码应该会非常容易:

在 TensorFlow 2.0 中,直接运行 1.X 的代码也是有可能的(除了contrib):

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

当然,这样做的话并不会让程序享受到许许多多在 TensorFlow 2.0 中做出的优化。本篇指南将帮助你升级你的代码,让其变得更简洁,性能更好,维护更简单。

自动转换脚本

在试图应用本文中描述的许多改动之前,我们要做的第一步是试着运行更新脚本

它会帮助你把代码部分地更新到 TensorFlow 2.0。但是它并不能使你的代码有着地道的 2.0 表达方式。你的代码也许还在使用着tf.compat.v1 endpoints 去访问 placeholders, sessions, collections, 以及其他 1.x 风格的函数。

顶层行为改动

如果你的 2.0 代码使用tf.compat.v1.disable_v2_behavior(),那么你得手动地去解决许多全局行为的改动。它们主要包括:

  • Eager execution,v1.enable_eager_execution():任何代码,如果隐式地使用了tf.Graph,将会运行失败。确保代码在with tf.Graph().as_default()的环境中运行。
  • Resource variables,v1.enable_resource_variables():某些代码依赖于 TF reference variables 带来的不确定的行为。资源变量在被写入的时候是锁定的,从而提供了一致性保证。
    • 在某些场景下这一点也许会改变其行为。
    • 这一点也许会带来多余的拷贝和更高的内存使用。
    • tf.Variable中,你可以通过设置use_resource=False来禁用它。
  • Tensor shapes,v1.nable_v2_tensorshape():TF 2.0 简化了 tensor shapes 的行为,例如,你可以用t.shape[0]来替代t.shape[0].value。这些改动并不大,修改起来应该很容易。TensorShape中有着更多的例子。
  • Control flow,v1.enable_control_flow_v2():TF 2.0 的控制流实现也被简化了,graph representations 也变得不一样了。如果你碰到问题,参考file bugs

写出原生的 2.0 代码

下面介绍一些将 TensorFlow 1.x 代码转换为 TensorFlow 2.0 代码的例子。这些例子可以让你的代码执行的更快,并且简化 API 调用。

1. 替换 v1.Session.run 调用

每个 v1.Session.run 调用都应该被替换为一个 Python 函数

  • feed_dictv1.placeholder 现在变成了函数的参数。
  • fetches 变成了函数的返回值
  • eager execution 允许标准的 Python 调试工具,例如pdb

在添加 tf.function修饰符之后,程序可以在图中更快的执行。Autograph介绍了其工作原理。

需要注意的是:

  • v1.Session.run不同的是,tf.function有着固定的返回签名(signature),并且永远返回所有的输出。如果你有性能上的担忧,创建两个不同的函数。
  • 没有必要再使用tf.controal_dependecies或者其他类似的操作了:tf.function的行为与其写入的顺序是一致的。tf.Variable分配和tf.assert等会自动地执行。

2. 使用 Python 对象去跟踪变量和损失(losses)

所有基于name的变量追踪再 TF 2.0 中是强烈不推荐的。使用 Python 对象来跟踪对象。

使用tf.Variable而不是v1.get_variable

每一个v1.variable_scope应当被转换为一个 Python 对象。一般来说是下面三种之一:

如果你需要聚合变量列表(例如tf.Graph.get_collection(tf.GraphKeys.VARIABLES)),使用LayerModel中的.variables.trainable_variables属性。

LayerModel类中有许多其他的属性,替代了全局 collections 的作用。例如,.losses属性可以替换tf.GraphKey.LOSSES collection。

更多细节请参考keras guides

警告:许多tf.compat.v1符号隐式地使用了全局 collections。

3. 升级你的训练循环

使用最高阶的 API。最好使用tf.keras.Model.fit来构建你自己的训练循环。

这些高阶函数管理着许多低阶的实现细节,从而让你可以更轻松地编写你自己的训练循环。例如,它们自动地收集正则化的 losses,在调用这些模型的时候设置参数training=True

4. 升级你的数据输入管道

使用tf.data数据集来进行数据输入。这些对象非常高效、简明,并且与 tensorflow 结合的非常好。

它们可以直接传递给tf.keras.Model.fit方法。

model.fit(dataset, epochs=5)

它们可以使用标准的 Python 迭代:

for example_batch, label_batch in dataset:
	break

5. 移除compat.v1符号

tf.compat.v1模块包含了完整的 TensorFlow 1.x API,以及它们原始的定义方法。

TF2 升级脚本会把这些符号转换为等价的 2.0 符号,如果转换过程是安全的话:也就是说,如果它可以明确 2.0 版本的代码是完全等价的(例如,它会把v1.arg_max改为tf.argmax,这两个函数是一致的)。

在执行升级脚本之后,代码之中可能会出现许多compat.v1。你最好手动地检查代码,并且把它们转换为等价的 2.0 代码(在转换日志里会提及它们)。

转换你的模型

步骤

from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf


import tensorflow_datasets as tfds

低阶变量和操作符执行

低阶API包括:

转换前

下面的代码使用的是 TensorFlow 1.x 的经典语法:

in_a = tf.placeholder(dtype=tf.float32, shape=(2))
in_b = tf.placeholder(dtype=tf.float32, shape=(2))

def forward(x):
  with tf.variable_scope("matmul", reuse=tf.AUTO_REUSE):
    W = tf.get_variable("W", initializer=tf.ones(shape=(2,2)),
                        regularizer=tf.contrib.layers.l2_regularizer(0.04))
    b = tf.get_variable("b", initializer=tf.zeros(shape=(2)))
    return W * x + b

out_a = forward(in_a)
out_b = forward(in_b)

reg_loss=tf.losses.get_regularization_loss(scope="matmul")

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  outs = sess.run([out_a, out_b, reg_loss],
                feed_dict={in_a: [1, 0], in_b: [0, 1]})

转换后

在转换后的代码中:

  • 变量变成了局部 Python 对象
  • forward函数仍然定义了计算
  • Session.run调用被替换为forward
  • 可选的tf.function修饰符可以增强模型的性能
  • Regularizations 可以自动地被计算,不需要参考任何全局 collection
  • 没有 sessions,也没有 placeholders
W = tf.Variable(tf.ones(shape=(2,2)), name="W")
b = tf.Variable(tf.zeros(shape=(2)), name="b")

@tf.function
def forward(x):
  return W * x + b

out_a = forward([1,0])
print(out_a)
tf.Tensor(
[[1. 0.]
 [1. 0.]], shape=(2, 2), dtype=float32)
out_b = forward([0,1])

regularizer = tf.keras.regularizers.l2(0.04)
reg_loss=regularizer(W)

基于tf.layers的模型

v1.layers模块被用在那些依赖v1.variable_scope的层级函数上来定义和重用变量。

转换前
def model(x, training, scope='model'):
  with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):
    x = tf.layers.conv2d(x, 32, 3, activation=tf.nn.relu,
          kernel_regularizer=tf.contrib.layers.l2_regularizer(0.04))
    x = tf.layers.max_pooling2d(x, (2, 2), 1)
    x = tf.layers.flatten(x)
    x = tf.layers.dropout(x, 0.1, training=training)
    x = tf.layers.dense(x, 64, activation=tf.nn.relu)
    x = tf.layers.batch_normalization(x, training=training)
    x = tf.layers.dense(x, 10)
    return x

train_out = model(train_data, training=True)
test_out = model(test_data, training=False)
转换后

大多数的参数都保持了不变。不过需要注意以下变化:

  • 在模型运行的时候,training被传递给各个 layer。
  • model原函数的第一个参数(输入x)已经没有了。这是因为调用模型和构建模型这两个操作在对象层中是分开的。

还需要注意的是:

  • 如果你仍在使用tf.contrib中 initializers 的regularizers,相比其他函数,它们的变动更大。
  • 代码不再需要 collections 了,所以类似于v1.losses.get_regularization_loss不会再返回那些值,这可能会破坏你的训练循环。
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu',
                           kernel_regularizer=tf.keras.regularizers.l2(0.04),
                           input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.1),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(10)
])

train_data = tf.ones(shape=(1, 28, 28, 1))
test_data = tf.ones(shape=(1, 28, 28, 1))
train_out = model(train_data, training=True)
print(train_out)
tf.Tensor([[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]], shape=(1, 10), dtype=float32)
test_out = model(test_data, training=False)
print(test_out)
tf.Tensor(
[[-0.06551158  0.00366845 -0.04681937 -0.03203971  0.30431384  0.04084986
  -0.02323238 -0.23164497  0.02273249  0.04298191]], shape=(1, 10), dtype=float32)
# Here are all the trainable variables.
len(model.trainable_variables)
8
# Here is the regularization loss.
model.losses
[<tf.Tensor: shape=(), dtype=float32, numpy=0.082791135>]

混合变量和v1.layers

旧的代码经常混合了低阶的 TF 1.x 变量和高阶的 v1.layers操作。

转换前
def model(x, training, scope='model'):
  with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):
    W = tf.get_variable(
      "W", dtype=tf.float32,
      initializer=tf.ones(shape=x.shape),
      regularizer=tf.contrib.layers.l2_regularizer(0.04),
      trainable=True)
    if training:
      x = x + W
    else:
      x = x + W * 0.5
    x = tf.layers.conv2d(x, 32, 3, activation=tf.nn.relu)
    x = tf.layers.max_pooling2d(x, (2, 2), 1)
    x = tf.layers.flatten(x)
    return x

train_out = model(train_data, training=True)
test_out = model(test_data, training=False)
转换后

为了转换这段代码,请依据之前的例子中介绍的层与层之间的一一转换。

每一个v1.variable_scope都可以高效地转换成它自己的一个层。所以请把它重构成tf.keras.layers.Layer。具体请参考这篇指南

一般的特征是:

  • __init__中收集层的参数
  • build中构建变量
  • call中执行计算,然后返回结果

v1.variable_scope本身就是它自己的一个层。所以请把它重构成tf.keras.layers.Layer。具体请参考这篇指南。(PS:原文也是这样重复了一遍呢😂)

# Create a custom layer for part of the model
class CustomLayer(tf.keras.layers.Layer):
  def __init__(self, *args, **kwargs):
    super(CustomLayer, self).__init__(*args, **kwargs)

  def build(self, input_shape):
    self.w = self.add_weight(
        shape=input_shape[1:],
        dtype=tf.float32,
        initializer=tf.keras.initializers.ones(),
        regularizer=tf.keras.regularizers.l2(0.02),
        trainable=True)

  # Call method will sometimes get used in graph mode,
  # training will get turned into a tensor
  @tf.function
  def call(self, inputs, training=None):
    if training:
      return inputs + self.w
    else:
      return inputs + self.w * 0.5
custom_layer = CustomLayer()
print(custom_layer([1]).numpy())
print(custom_layer([1], training=True).numpy())
[1.5]
[2.]
train_data = tf.ones(shape=(1, 28, 28, 1))
test_data = tf.ones(shape=(1, 28, 28, 1))

# Build the model including the custom layer
model = tf.keras.Sequential([
    CustomLayer(input_shape=(28, 28, 1)),
    tf.keras.layers.Conv2D(32, 3, activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
])

train_out = model(train_data, training=True)
test_out = model(test_data, training=False)

一些要注意的事项:

  • 子类 Keras models 和 layers 需要同时运行在 v1 图(没有自动控制依赖)和 eager 模式中
    • call()包含在tf.function()中,来获得自动图和自动控制依赖
  • 不要忘记在call中接受一个training参数
    • 有些时候它是一个tf.Tensor
    • 有些时候它是一个 Python 布尔数
  • 在 constructor 中创建模型变量,或者使用Model.build中的self.add_weight()
  • 不要在对象中保留tf.Tensors
    • 它们也许会在tf.function中或者在 eager 环境中被创建,并且这些 tensors 有着不一样的行为。
    • 使用tf.Variable,它们在各种环境下都可用
    • tf.Tensors只适用于中间变量

关于 Slim 和 contrib.layers 的一点说明

很多旧的 TensorFlow 1.x 代码都是用了 Slim库,后者以tf.contrib.layers的形式被包装在 TensorFlow 1.x 中。作为一个contrib模块,它在 TensorFlow 2.0 中,甚至是在tf.compat.v1中,都不再存在了。所以转换一个依赖于 Slim 的代码会比转换一个使用v1.layers的代码需要更多的改动。事实上,最好先把依赖于 Slim 的代码转换为 v1.layers,然后再把代码转换为 Keras。

  • 移除arg_scopes,所有的参数都需要变得显式
  • 如果你使用它们,将normalizer_fnactivation_fn在它们各自的层中分开
  • 可分离的 conv 层遍历到一个或多个不同的 Keras 层中(depthwise,pointwise,and separable Keras layers)
  • Slim 和 v1.layers有着不同的参数名和默认值
  • 有些参数有着不同的范围
  • 如果你使用基于 Slim 的预训练模型,请尝试来自tf.keras.applications的 Keras 预训练模型,或者从 Slim 原代码中导出的TF Hub的 TF2 SavedModels 。

有些tf.contrib层也许并没有从 Tensorflow 核心中去除,而是转移到了TF add-ons package

训练

将数据输入到tf.keras中有着很多方法。你可以使用 Python 生成器(generators)或者 Numpy 数组作为模型的输入。

我们推荐使用tf.data来将数据输入到模型之中,它拥有着许多操作数据的高性能类。

如果你还在使用tf.queue,它们现在仅被当作数据结构来支持,而不是输入管道(pipelines)。

使用数据集

TensorFlow 数据集包(tfds)中包含了许多加载预定义数据集的工具,这些数据集可以被视为tf.data.Dataset对象。

在下面这个例子中,我们使用tfds来加载 MNIST 数据集:

datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True)
mnist_train, mnist_test = datasets['train'], datasets['test']
Downloading and preparing dataset mnist/3.0.1 (download: 11.06 MiB, generated: 21.00 MiB, total: 32.06 MiB) to /home/kbuilder/tensorflow_datasets/mnist/3.0.1...

WARNING:absl:Dataset mnist is hosted on GCS. It will automatically be downloaded to your
local data directory. If you'd instead prefer to read directly from our public
GCS bucket (recommended if you're running on GCP), you can instead set
data_dir=gs://tfds-data/datasets.


HBox(children=(FloatProgress(value=0.0, description='Dl Completed...', max=4.0, style=ProgressStyle(descriptio…


Dataset mnist downloaded and prepared to /home/kbuilder/tensorflow_datasets/mnist/3.0.1. Subsequent calls will reuse this data.

然后处理数据以便进行训练:

  • 对每个图片进行收缩
  • 打乱样本的顺序
  • 构建图片和标签的 batches
BUFFER_SIZE = 10 # Use a much larger value for real code.
BATCH_SIZE = 64
NUM_EPOCHS = 5


def scale(image, label):
  image = tf.cast(image, tf.float32)
  image /= 255

  return image, label

为了让例子简短,我们让数据集只返回 5 个 batches:

train_data = mnist_train.map(scale).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
test_data = mnist_test.map(scale).batch(BATCH_SIZE)

STEPS_PER_EPOCH = 5

train_data = train_data.take(STEPS_PER_EPOCH)
test_data = test_data.take(STEPS_PER_EPOCH)
image_batch, label_batch = next(iter(train_data))

使用 Keras 训练循环

如果你不需要低阶地控制你的训练过程,那么我们推荐你使用 Keras 内建的方法fitevaluatepredict。这些方法提供了一套统一的接口来训练模型,而不需要用户去关心详细的实现(sequential, functional, or sub-classed)。

使用这些方法的优点有:

  • 它们接受 Numpy 数组,Python 生成器,以及tf.data.Datasets
  • 它们自动地应用 regularization,以及 activation losses。
  • 它们支持tf.distribution从而允许多设备训练
  • 它们支持任意的调用并返回 losses 和 metrics。
  • 它们支持tf.keras.callbacks.TensorBoard或者其它的自定义回调函数。
  • 它们的性能高,可以自动地使用 TensorFlow 图。

下面是一个使用Dataset来训练模型的例子。(关于这个例子的更多细节请参考这个教程。)

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu',
                           kernel_regularizer=tf.keras.regularizers.l2(0.02),
                           input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.1),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(10)
])

# Model is the full model w/o custom layers
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(train_data, epochs=NUM_EPOCHS)
loss, acc = model.evaluate(test_data)

print("Loss {}, Accuracy {}".format(loss, acc))
Epoch 1/5
5/5 [==============================] - 1s 130ms/step - loss: 1.6645 - accuracy: 0.4781
Epoch 2/5
5/5 [==============================] - 0s 18ms/step - loss: 0.4381 - accuracy: 0.9000
Epoch 3/5
5/5 [==============================] - 0s 17ms/step - loss: 0.2780 - accuracy: 0.9656
Epoch 4/5
5/5 [==============================] - 0s 17ms/step - loss: 0.2142 - accuracy: 0.9781
Epoch 5/5
5/5 [==============================] - 0s 17ms/step - loss: 0.1694 - accuracy: 0.9906
      5/Unknown - 0s 24ms/step - loss: 1.5080 - accuracy: 0.6250Loss 1.5080101490020752, Accuracy 0.625

编写你自己的循环

除了 Keras 模型自己的训练步骤,如果你想要对训练步骤进行更多的控制,请考虑在你自己的数据遍历循环中使用tf.keras.Model.train_on_batch方法。

记住:你可以在tf.keras.callbacks.Callback中应用非常多的功能。

这个方法拥有许多优点,并且给予用户更多的循环外控制。

你还可以在训练的时候使用tf.keras.Model.test_on_batch或者tf.keras.Model.evaluate去检查模型的效果。

注意:train_on_batchtest_on_batch,默认情况下会返回一个 batch 的 loss 和 metrics。如果你设定参数 reset_metrics=False,那么它们返回累积的 metrics,并且你得恰当地重置 metric 累积器。有些 metrics 例如AUC要求reset_metrics=False,它们才能正确地被计算。

继续上面的例子:

# Model is the full model w/o custom layers
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

for epoch in range(NUM_EPOCHS):
  #Reset the metric accumulators
  model.reset_metrics()

  for image_batch, label_batch in train_data:
    result = model.train_on_batch(image_batch, label_batch)
    metrics_names = model.metrics_names
    print("train: ",
          "{}: {:.3f}".format(metrics_names[0], result[0]),
          "{}: {:.3f}".format(metrics_names[1], result[1]))
  for image_batch, label_batch in test_data:
    result = model.test_on_batch(image_batch, label_batch,
                                 # return accumulated metrics
                                 reset_metrics=False)
  metrics_names = model.metrics_names
  print("\neval: ",
        "{}: {:.3f}".format(metrics_names[0], result[0]),
        "{}: {:.3f}".format(metrics_names[1], result[1]))
train:  loss: 0.147 accuracy: 1.000
train:  loss: 0.179 accuracy: 0.969
train:  loss: 0.157 accuracy: 0.984
train:  loss: 0.223 accuracy: 0.969
train:  loss: 0.197 accuracy: 0.953

eval:  loss: 1.551 accuracy: 0.669
train:  loss: 0.096 accuracy: 1.000
train:  loss: 0.098 accuracy: 1.000
train:  loss: 0.086 accuracy: 1.000
train:  loss: 0.149 accuracy: 0.984
train:  loss: 0.136 accuracy: 0.969

eval:  loss: 1.480 accuracy: 0.753
train:  loss: 0.080 accuracy: 1.000
train:  loss: 0.080 accuracy: 1.000
train:  loss: 0.076 accuracy: 1.000
train:  loss: 0.079 accuracy: 1.000
train:  loss: 0.067 accuracy: 1.000

eval:  loss: 1.438 accuracy: 0.797
train:  loss: 0.061 accuracy: 1.000
train:  loss: 0.068 accuracy: 1.000
train:  loss: 0.065 accuracy: 1.000
train:  loss: 0.068 accuracy: 1.000
train:  loss: 0.065 accuracy: 1.000

eval:  loss: 1.435 accuracy: 0.816
train:  loss: 0.055 accuracy: 1.000
train:  loss: 0.058 accuracy: 1.000
train:  loss: 0.050 accuracy: 1.000
train:  loss: 0.056 accuracy: 1.000
train:  loss: 0.053 accuracy: 1.000

eval:  loss: 1.434 accuracy: 0.800

自定义训练步骤

如果你需要对训练过程有着更灵活的控制,那么你可以编写你自己的训练循环。下面是三个步骤:

  1. 用 Python 生成器,或者用tf.data.Dataset来遍历样本的 batches。
  2. 使用tf.GradientTape来收集 gradients(梯度)。
  3. 使用一种tf.keras.optimizers来对模型的参数进行权值更新。

记住:

  • 对于子类层或者模型中的call方法,一定要记住设置training参数。
  • 确保调用模型的时候,training参数被正确低设置了。
  • 根据你具体的需求,模型的变量直到模型真正运行之前可能是不存在的。
  • 你需要手动地操控一些事情,例如模型的 regularization losses。

对于 v1 版本的一些简化:

  • 不再需要去运行变量初始化器(variable initializers)了。变量会在创建的时候自动初始化。
  • 不再需要手动地去控制依赖了。就算在tf.function中,各种操作的行为就像在 eager mode 中。
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu',
                           kernel_regularizer=tf.keras.regularizers.l2(0.02),
                           input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.1),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(10)
])

optimizer = tf.keras.optimizers.Adam(0.001)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

@tf.function
def train_step(inputs, labels):
  with tf.GradientTape() as tape:
    predictions = model(inputs, training=True)
    regularization_loss=tf.math.add_n(model.losses)
    pred_loss=loss_fn(labels, predictions)
    total_loss=pred_loss + regularization_loss

  gradients = tape.gradient(total_loss, model.trainable_variables)
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))

for epoch in range(NUM_EPOCHS):
  for inputs, labels in train_data:
    train_step(inputs, labels)
  print("Finished epoch", epoch)
Finished epoch 0
Finished epoch 1
Finished epoch 2
Finished epoch 3
Finished epoch 4

新的 metrics 和 losses

在 TensorFlow 2.0 中,meetrics 和 losses 现在定义为对象(objects)。它们的工作方式兼容于 eager mode 和 tf.function

一个 loss 对象是可调用的,并且期望 (y_true, y_pred) 这样的参数:

cce = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
cce([[1, 0]], [[-1.0,3.0]]).numpy()
4.01815

一个 metric 对象有着下列方法:

对象本身是可以调用的。给定新的观测,调用会更新其状态,就像update_state一样,然后返回新的结果。

你不需要手动地初始化 metric 的变量,因为 TensorFlow 2.0 可以自动地控制依赖,你不需要担心这些细节。

下面的代码使用了 metric 去跟踪自定义循环下所观察到的均值 loss。

# Create the metrics
loss_metric = tf.keras.metrics.Mean(name='train_loss')
accuracy_metric = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')

@tf.function
def train_step(inputs, labels):
  with tf.GradientTape() as tape:
    predictions = model(inputs, training=True)
    regularization_loss=tf.math.add_n(model.losses)
    pred_loss=loss_fn(labels, predictions)
    total_loss=pred_loss + regularization_loss

  gradients = tape.gradient(total_loss, model.trainable_variables)
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))
  # Update the metrics
  loss_metric.update_state(total_loss)
  accuracy_metric.update_state(labels, predictions)


for epoch in range(NUM_EPOCHS):
  # Reset the metrics
  loss_metric.reset_states()
  accuracy_metric.reset_states()

  for inputs, labels in train_data:
    train_step(inputs, labels)
  # Get the metric results
  mean_loss=loss_metric.result()
  mean_accuracy = accuracy_metric.result()

  print('Epoch: ', epoch)
  print('  loss:     {:.3f}'.format(mean_loss))
  print('  accuracy: {:.3f}'.format(mean_accuracy))
Epoch:  0
  loss:     0.125
  accuracy: 0.997
Epoch:  1
  loss:     0.106
  accuracy: 1.000
Epoch:  2
  loss:     0.091
  accuracy: 1.000
Epoch:  3
  loss:     0.085
  accuracy: 0.997
Epoch:  4
  loss:     0.072
  accuracy: 1.000

Keras metric names

在 TensorFlow 2.0 中 keras 模型对于 metric 名有着更统一的行为。

现在当你对 metrics 列表中传递字符串时,该字符串会被当作 metric 的name。这些 names 对于model.fit所返回的历史对象是可见的,并且在传输给keras.callbacks的记录中,也被设置为你传输给 metric 列表的字符串。

model.compile(
    optimizer = tf.keras.optimizers.Adam(0.001),
    loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics = ['acc', 'accuracy', tf.keras.metrics.SparseCategoricalAccuracy(name="my_accuracy")])
history = model.fit(train_data)
5/5 [==============================] - 1s 110ms/step - loss: 0.0832 - acc: 1.0000 - accuracy: 1.0000 - my_accuracy: 1.0000
history.history.keys()
dict_keys(['loss', 'acc', 'accuracy', 'my_accuracy'])

这些变动与之前的版本不一样在,之前的版本在传递metrics=["accuracy"]参数时,返回的是dict_keys(['loss', 'acc'])

Keras 优化器(optimizers)

v1.train中的优化器,例如v1.train.AdamOptimizerv1.train.GradientDescentOptimizer,在tf.keras.optimizers有着对应的优化器。

v1.train转换到keras.optimizers

下面这些事是你在转换的过程中需要注意的:

一些tf.keras.optimizers有了新的默认行为

警告:如果你在模型中发现收敛行为有了变化,请检查默认的学习率。

对于optimizers.SGDoptimizers.Adamoptimizers.RMSprop,它们的学习率没有变化。

有变化的是:

TensorBoard

对于tf.summary API,TensorFlow 2 有着许多重大的改动。对于新tf.summary的介绍,这里有许多可以参考的教程。这里还有一个TensorBoard TF 2 迁移指南

保存和加载

断点兼容性 Checkpoint compatibility

TensorFlow 2.0 使用了基于对象的断点

旧版本的基于name的断点仍旧可以被加载,如果你在意的话。代码迁移过程也许会造成变量名的变动,但是这里有一些变通的解决之道。

最简单的方法是列出新模型的 names 和断点中的 names:

  • 变量们仍旧有着可设置的name参数
  • Keras 模型接受name参数当作它们变量的前缀。
  • v1.name_scope函数可以用来设置变量名的前缀。这与tf.variable_scope的行为非常不同。它只影响names,而不跟踪变量及重用。

如果这些对你的用例不起作用,试试v1.train.init_from_checkpoint函数。它接受一个assignment_map参数,这个参数指定了从旧names到新names的映射。

注意:与基于对象的断点不同(它可以推迟加载),基于name的断点需要所有的变量都在函数被调用的时候被构建。有一些模型在你调用build或者运行模型的时候,延迟了构建变量的过程。

TensorFlow Estimator repository 有着一个转换工具来升级断点的从 TensorFlow 1.X 到 2.0 的预制 estimators。这可以当作“如何构建这样的工具”的一个相似的例子。

保存的模型的兼容性

对于保存的模型,并没有显著的兼容性问题。

  • TensorFlow 1.x saved_models 可以在 TensorFlow 2.x 中运行
  • TensorFlow 2.x saved_models 也可以在 TensorFlow 1.x 中运行,如果所有的操作(ops)都支持的话。

Graph.pd 或者 Graph.pbtxt

如果你想升级一个原始的 Graph.pd文件到 TensorFlow 2.0,这里没有一个直接的方法。你最佳的选择是升级生成那个文件的代码。

但是,如果你有一个 “Frozen graph”(一种tf.Graph,其变量被设置成常量),那么你可以使用v1.wrap_function来将它转换为concrete_function

def wrap_frozen_graph(graph_def, inputs, outputs):
  def _imports_graph_def():
    tf.compat.v1.import_graph_def(graph_def, name="")
  wrapped_import = tf.compat.v1.wrap_function(_imports_graph_def, [])
  import_graph = wrapped_import.graph
  return wrapped_import.prune(
      tf.nest.map_structure(import_graph.as_graph_element, inputs),
      tf.nest.map_structure(import_graph.as_graph_element, outputs))

例如,这里有一个2016年的 Inception v1 frozed graph:

path = tf.keras.utils.get_file(
    'inception_v1_2016_08_28_frozen.pb',
    'http://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz',
    untar=True)
Downloading data from http://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz
24698880/24695710 [==============================] - 1s 0us/step

加载tf.GraphDef

graph_def = tf.compat.v1.GraphDef()
loaded = graph_def.ParseFromString(open(path,'rb').read())

将其包装在concrete_function中:

inception_func = wrap_frozen_graph(
    graph_def, inputs='input:0',
    outputs='InceptionV1/InceptionV1/Mixed_3b/Branch_1/Conv2d_0a_1x1/Relu:0')

将一个 tensor 传递给它当作输入:

input_img = tf.ones([1,224,224,3], dtype=tf.float32)
inception_func(input_img).shape
TensorShape([1, 28, 28, 96])

Estimators

训练过程中的 Estimators

Estimators 在 TensorFlow 2.0 中是支持的。

当你使用 estimators,你可以使用 TensorFlow 1.x 中的input_fn()tf.estimator.TrainSpec,以及tf.estimator.EvalSpec

下面是一个使用input_fn来训练和评估的例子。

创建 input_fn 和训练/评估 specs
# Define the estimator's input_fn
def input_fn():
  datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True)
  mnist_train, mnist_test = datasets['train'], datasets['test']

  BUFFER_SIZE = 10000
  BATCH_SIZE = 64

  def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255

    return image, label[..., tf.newaxis]

  train_data = mnist_train.map(scale).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
  return train_data.repeat()

# Define train & eval specs
train_spec = tf.estimator.TrainSpec(input_fn=input_fn,
                                    max_steps=STEPS_PER_EPOCH * NUM_EPOCHS)
eval_spec = tf.estimator.EvalSpec(input_fn=input_fn,
                                  steps=STEPS_PER_EPOCH)

使用 Keras 模型定义

在 TensorFlow 2.0 中,你构建 estimators 的方法有了一些变动。

我们推荐你使用 Keras 来定义你的模型,然后使用tf.keras.estimator.model_to_estimator工具来将你的模型转换到一个 estimator。下面的代码展示了在创建和训练一个 estimator 的时候,如何使用这个工具。

def make_model():
  return tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu',
                           kernel_regularizer=tf.keras.regularizers.l2(0.02),
                           input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.1),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(10)
  ])
model = make_model()

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

estimator = tf.keras.estimator.model_to_estimator(
  keras_model = model
)

tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
INFO:tensorflow:Using default config.

INFO:tensorflow:Using default config.

WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpfzr8hjlh

WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpfzr8hjlh

INFO:tensorflow:Using the Keras model provided.

INFO:tensorflow:Using the Keras model provided.

WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1635: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.

WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1635: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.

INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpfzr8hjlh', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpfzr8hjlh', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Not using Distribute Coordinator.

INFO:tensorflow:Not using Distribute Coordinator.

INFO:tensorflow:Running training and evaluation locally (non-distributed).

INFO:tensorflow:Running training and evaluation locally (non-distributed).

INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.

INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.

WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.

WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmpfzr8hjlh/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})

INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmpfzr8hjlh/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})

INFO:tensorflow:Warm-starting from: /tmp/tmpfzr8hjlh/keras/keras_model.ckpt

INFO:tensorflow:Warm-starting from: /tmp/tmpfzr8hjlh/keras/keras_model.ckpt

INFO:tensorflow:Warm-starting variables only in TRAINABLE_VARIABLES.

INFO:tensorflow:Warm-starting variables only in TRAINABLE_VARIABLES.

INFO:tensorflow:Warm-started 8 variables.

INFO:tensorflow:Warm-started 8 variables.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpfzr8hjlh/model.ckpt.

INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpfzr8hjlh/model.ckpt.

INFO:tensorflow:loss = 2.462402, step = 0

INFO:tensorflow:loss = 2.462402, step = 0

INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmpfzr8hjlh/model.ckpt.

INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmpfzr8hjlh/model.ckpt.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Starting evaluation at 2020-03-28T01:53:11Z

INFO:tensorflow:Starting evaluation at 2020-03-28T01:53:11Z

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from /tmp/tmpfzr8hjlh/model.ckpt-25

INFO:tensorflow:Restoring parameters from /tmp/tmpfzr8hjlh/model.ckpt-25

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Evaluation [1/5]

INFO:tensorflow:Evaluation [1/5]

INFO:tensorflow:Evaluation [2/5]

INFO:tensorflow:Evaluation [2/5]

INFO:tensorflow:Evaluation [3/5]

INFO:tensorflow:Evaluation [3/5]

INFO:tensorflow:Evaluation [4/5]

INFO:tensorflow:Evaluation [4/5]

INFO:tensorflow:Evaluation [5/5]

INFO:tensorflow:Evaluation [5/5]

INFO:tensorflow:Inference Time : 0.84085s

INFO:tensorflow:Inference Time : 0.84085s

INFO:tensorflow:Finished evaluation at 2020-03-28-01:53:11

INFO:tensorflow:Finished evaluation at 2020-03-28-01:53:11

INFO:tensorflow:Saving dict for global step 25: accuracy = 0.69375, global_step = 25, loss = 1.55557

INFO:tensorflow:Saving dict for global step 25: accuracy = 0.69375, global_step = 25, loss = 1.55557

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmpfzr8hjlh/model.ckpt-25

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmpfzr8hjlh/model.ckpt-25

INFO:tensorflow:Loss for final step: 0.45250922.

INFO:tensorflow:Loss for final step: 0.45250922.

({'accuracy': 0.69375, 'loss': 1.55557, 'global_step': 25}, [])

使用自定义的model_fn

如果你已经有了一个自定义的需要维护的 estimator model_fn,你可以将你的model_fn转换为一个 Keras 模型。

当然,因为兼容性的一些问题,一个自定义的model_fn仍然可以在 1.x 的图模式中运行。这意味着它不能运行在 eager execution,也不能自动控制依赖。

用最少的改动来自定义 model_fn

为了让你的自定义 model_fn 能在 TF 2.0 中运行,如果你偏好用最少的改动,那么例如optimizersmetricstf.compat.v1符号可以派上用场。

在自定义model_fn中使用 Keras 模型与在自定义训练循环中使用它比较类似:

  • 基于mode参数来恰当地设置training状态
  • 显式地传递模型的trainable_variables到 optimizer

但是对于自定义训练循环,这里有一些重要的差异:

注意:“updates” 需要在每一个 batch 之后应用到模型中。例如,在layers.BatchNormalization层中的mean 和 variance 的移动平均数。

下面的代码从一个自定义的model_fn中创建了一个 estimator,解释了上述所有的问题。

def my_model_fn(features, labels, mode):
  model = make_model()

  optimizer = tf.compat.v1.train.AdamOptimizer()
  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

  training = (mode == tf.estimator.ModeKeys.TRAIN)
  predictions = model(features, training=training)

  if mode == tf.estimator.ModeKeys.PREDICT:
    return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

  reg_losses = model.get_losses_for(None) + model.get_losses_for(features)
  total_loss=loss_fn(labels, predictions) + tf.math.add_n(reg_losses)

  accuracy = tf.compat.v1.metrics.accuracy(labels=labels,
                                           predictions=tf.math.argmax(predictions, axis=1),
                                           name='acc_op')

  update_ops = model.get_updates_for(None) + model.get_updates_for(features)
  minimize_op = optimizer.minimize(
      total_loss,
      var_list=model.trainable_variables,
      global_step=tf.compat.v1.train.get_or_create_global_step())
  train_op = tf.group(minimize_op, update_ops)

  return tf.estimator.EstimatorSpec(
    mode=mode,
    predictions=predictions,
    loss=total_loss,
    train_op=train_op, eval_metric_ops={'accuracy': accuracy})

# Create the Estimator & Train
estimator = tf.estimator.Estimator(model_fn=my_model_fn)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
INFO:tensorflow:Using default config.

INFO:tensorflow:Using default config.

WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp8g2a8yh1

WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp8g2a8yh1

INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp8g2a8yh1', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp8g2a8yh1', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Not using Distribute Coordinator.

INFO:tensorflow:Not using Distribute Coordinator.

INFO:tensorflow:Running training and evaluation locally (non-distributed).

INFO:tensorflow:Running training and evaluation locally (non-distributed).

INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.

INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp8g2a8yh1/model.ckpt.

INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp8g2a8yh1/model.ckpt.

INFO:tensorflow:loss = 2.4837275, step = 0

INFO:tensorflow:loss = 2.4837275, step = 0

INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmp8g2a8yh1/model.ckpt.

INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmp8g2a8yh1/model.ckpt.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Starting evaluation at 2020-03-28T01:53:14Z

INFO:tensorflow:Starting evaluation at 2020-03-28T01:53:14Z

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from /tmp/tmp8g2a8yh1/model.ckpt-25

INFO:tensorflow:Restoring parameters from /tmp/tmp8g2a8yh1/model.ckpt-25

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Evaluation [1/5]

INFO:tensorflow:Evaluation [1/5]

INFO:tensorflow:Evaluation [2/5]

INFO:tensorflow:Evaluation [2/5]

INFO:tensorflow:Evaluation [3/5]

INFO:tensorflow:Evaluation [3/5]

INFO:tensorflow:Evaluation [4/5]

INFO:tensorflow:Evaluation [4/5]

INFO:tensorflow:Evaluation [5/5]

INFO:tensorflow:Evaluation [5/5]

INFO:tensorflow:Inference Time : 0.96313s

INFO:tensorflow:Inference Time : 0.96313s

INFO:tensorflow:Finished evaluation at 2020-03-28-01:53:15

INFO:tensorflow:Finished evaluation at 2020-03-28-01:53:15

INFO:tensorflow:Saving dict for global step 25: accuracy = 0.553125, global_step = 25, loss = 1.7363777

INFO:tensorflow:Saving dict for global step 25: accuracy = 0.553125, global_step = 25, loss = 1.7363777

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmp8g2a8yh1/model.ckpt-25

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmp8g2a8yh1/model.ckpt-25

INFO:tensorflow:Loss for final step: 0.4678836.

INFO:tensorflow:Loss for final step: 0.4678836.

({'accuracy': 0.553125, 'loss': 1.7363777, 'global_step': 25}, [])
TF 2.0 符号下的自定义model_fn

如果你想移除所有的 TF 1.x 符号,并且升级你自定义的model_fn到原生的 TF 2.0,那么你需要升级 optimizer 和 metrics 到tf.keras.optimizerstf.keras.metrics

在自定义的model_fn中,除了上述的改动,还需要设置更多:

对于上面的my_model_fn例子,下面展示了使用 2.0 符号的迁移代码:

def my_model_fn(features, labels, mode):
  model = make_model()

  training = (mode == tf.estimator.ModeKeys.TRAIN)
  loss_obj = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
  predictions = model(features, training=training)

  # Get both the unconditional losses (the None part)
  # and the input-conditional losses (the features part).
  reg_losses = model.get_losses_for(None) + model.get_losses_for(features)
  total_loss=loss_obj(labels, predictions) + tf.math.add_n(reg_losses)

  # Upgrade to tf.keras.metrics.
  accuracy_obj = tf.keras.metrics.Accuracy(name='acc_obj')
  accuracy = accuracy_obj.update_state(
      y_true=labels, y_pred=tf.math.argmax(predictions, axis=1))

  train_op = None
  if training:
    # Upgrade to tf.keras.optimizers.
    optimizer = tf.keras.optimizers.Adam()
    # Manually assign tf.compat.v1.global_step variable to optimizer.iterations
    # to make tf.compat.v1.train.global_step increased correctly.
    # This assignment is a must for any `tf.train.SessionRunHook` specified in
    # estimator, as SessionRunHooks rely on global step.
    optimizer.iterations = tf.compat.v1.train.get_or_create_global_step()
    # Get both the unconditional updates (the None part)
    # and the input-conditional updates (the features part).
    update_ops = model.get_updates_for(None) + model.get_updates_for(features)
    # Compute the minimize_op.
    minimize_op = optimizer.get_updates(
        total_loss,
        model.trainable_variables)[0]
    train_op = tf.group(minimize_op, *update_ops)

  return tf.estimator.EstimatorSpec(
    mode=mode,
    predictions=predictions,
    loss=total_loss,
    train_op=train_op,
    eval_metric_ops={'Accuracy': accuracy_obj})

# Create the Estimator & Train.
estimator = tf.estimator.Estimator(model_fn=my_model_fn)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
```python
INFO:tensorflow:Using default config.

INFO:tensorflow:Using default config.

WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp8g2a8yh1

WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp8g2a8yh1

INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp8g2a8yh1', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp8g2a8yh1', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Not using Distribute Coordinator.

INFO:tensorflow:Not using Distribute Coordinator.

INFO:tensorflow:Running training and evaluation locally (non-distributed).

INFO:tensorflow:Running training and evaluation locally (non-distributed).

INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.

INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp8g2a8yh1/model.ckpt.

INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp8g2a8yh1/model.ckpt.

INFO:tensorflow:loss = 2.4837275, step = 0

INFO:tensorflow:loss = 2.4837275, step = 0

INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmp8g2a8yh1/model.ckpt.

INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmp8g2a8yh1/model.ckpt.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Starting evaluation at 2020-03-28T01:53:14Z

INFO:tensorflow:Starting evaluation at 2020-03-28T01:53:14Z

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from /tmp/tmp8g2a8yh1/model.ckpt-25

INFO:tensorflow:Restoring parameters from /tmp/tmp8g2a8yh1/model.ckpt-25

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Evaluation [1/5]

INFO:tensorflow:Evaluation [1/5]

INFO:tensorflow:Evaluation [2/5]

INFO:tensorflow:Evaluation [2/5]

INFO:tensorflow:Evaluation [3/5]

INFO:tensorflow:Evaluation [3/5]

INFO:tensorflow:Evaluation [4/5]

INFO:tensorflow:Evaluation [4/5]

INFO:tensorflow:Evaluation [5/5]

INFO:tensorflow:Evaluation [5/5]

INFO:tensorflow:Inference Time : 0.96313s

INFO:tensorflow:Inference Time : 0.96313s

INFO:tensorflow:Finished evaluation at 2020-03-28-01:53:15

INFO:tensorflow:Finished evaluation at 2020-03-28-01:53:15

INFO:tensorflow:Saving dict for global step 25: accuracy = 0.553125, global_step = 25, loss = 1.7363777

INFO:tensorflow:Saving dict for global step 25: accuracy = 0.553125, global_step = 25, loss = 1.7363777

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmp8g2a8yh1/model.ckpt-25

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmp8g2a8yh1/model.ckpt-25

INFO:tensorflow:Loss for final step: 0.4678836.

INFO:tensorflow:Loss for final step: 0.4678836.

({'accuracy': 0.553125, 'loss': 1.7363777, 'global_step': 25}, [])

预设的 Estimators

tf.estimator.DNN*tf.estimator.Linear*tf.estimator.DNNLinearCombined*预设的 Estimators在 TensorFlow 2.0 API 中是受到支持的,当然,一些参数发生了改动:

  1. input_layer_partitioner:在 2.0 中移除了
  2. loss_reduction:在tf.keras.losses.Reduction中更新而不是在tf.compat.v1.losses.Reduction。它默认值也从tf.compat.v1.losses.Reduction.SUM改动到tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE
  3. optimizerdnn_optimizerlinear_optimizer:这些参数被更新到了tf.keras.optimizers而不是tf.compat.v1.train.Optimizer

为了应对上述变动:

  1. 对于input_layer_partitioner,你不需要进行改动,因为 TF 2.0 中的Distribution Strategy可以自动地应对这些问题
  2. 对于loss_reduction,检查tf.keras.losses.Reduction中可用的选项
  3. 对于optimizer参数,如果你没有在optimizerdnn_optimizer 或者 linear_optimizer 参数中传递,或者你在代码中指定optimizer参数为string,那么你不需要做任何事情。[tf.keras.optimizers](https://tensorflow.google.cn/api_docs/python/tf/keras/optimizers会默认地被使用。其他情况,你需要从tf.compat.v1.train.Optimizer中升级它为相应的tf.keras.optimizers
断点转换器 Checkpoint Converter

keras.optimizers的迁移会破坏 TF 1.x 中保存的断点,因为tf.keras.optimizers生成了一组保存在断点中的不同的变量。为了让旧的断点在 TF 2.0 中仍然可用,试着使用这个断点转换器工具

 curl -O https://raw.githubusercontent.com/tensorflow/estimator/master/tensorflow_estimator/python/estimator/tools/checkpoint_converter.py
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 15157  100 15157    0     0  22256      0 --:--:-- --:--:-- --:--:-- 22224

这个工具有着内建的帮助:

$ python checkpoint_converter.py -h
2020-03-28 01:53:21.210238: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-03-28 01:53:21.210483: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-03-28 01:53:21.210501: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
usage: checkpoint_converter.py [-h]
                               {dnn,linear,combined} source_checkpoint
                               source_graph target_checkpoint

positional arguments:
  {dnn,linear,combined}
                        The type of estimator to be converted. So far, the
                        checkpoint converter only supports Canned Estimator.
                        So the allowed types include linear, dnn and combined.
  source_checkpoint     Path to source checkpoint file to be read in.
  source_graph          Path to source graph file to be read in.
  target_checkpoint     Path to checkpoint file to be written out.

optional arguments:
  -h, --help            show this help message and exit

TensorShape

相比于tf.compat.v1.Dimension对象,这个简化的类可以更好的去应对int。所以你不再需要调用.value()去获得int

独立地tf.compat.v1.Dimension对象仍旧可以从tf.TensorShape.dims中访问。

下面的示例展示了 TensorFlow 1.x 和 TensorFlow 2.0 的不同。

# Create a shape and choose an index
i = 0
shape = tf.TensorShape([16, None, 256])
shape
TensorShape([16, None, 256])

如果你在 TF 1.x 中这样做:

value = shape[i].value

那么在 TF 2.0 中这样做:

value = shape[i]
value
16

如果你在 TF 1.x 中这样做:

for dim in shape:
    value = dim.value
    print(value)

那么在 TF 2.0 中这样做:

for value in shape:
  print(value)
16
None
256

如果你在 TF 1.x 中这样做(或者使用了任何其他的维度方法):

dim = shape[i]
dim.assert_is_compatible_with(other_dim)

那么在 TF 2.0 中这样做:

other_dim = 16
Dimension = tf.compat.v1.Dimension

if shape.rank is None:
  dim = Dimension(None)
else:
  dim = shape.dims[i]
dim.is_compatible_with(other_dim) # or any other dimension method
True
shape = tf.TensorShape(None)

if shape:
  dim = shape.dims[i]
  dim.is_compatible_with(other_dim) # or any other dimension method

如果 rank 是已知的,那么tf.TensorShape的布尔值是True,其他情况为False

print(bool(tf.TensorShape([])))      # Scalar
print(bool(tf.TensorShape([0])))     # 0-length vector
print(bool(tf.TensorShape([1])))     # 1-length vector
print(bool(tf.TensorShape([None])))  # Unknown-length vector
print(bool(tf.TensorShape([1, 10, 100])))       # 3D tensor
print(bool(tf.TensorShape([None, None, None]))) # 3D tensor with no known dimensions
print()
print(bool(tf.TensorShape(None)))  # A tensor with unknown rank.
True
True
True
True
True
True

False

其他改动

  • 移除了tf.colocate_with:TensorFlow 的设备安置算法被显著地改进了。这个用法应该不再被需要了。如果移除它造成了性能降级,请提交一个 bug
  • 替换v1.ConfigProto,其等价的函数来自tf.config

总结

总体的过程如下:

  1. 运行升级脚本
  2. 移除 contrib 符号
  3. 将你的模型转换为面向对象的风格(Keras)
  4. 使用tf.keras或者tf.estimator来训练和评估循环
  5. 其他情况,使用自定义的循环,但是要注意避免使用 sessions 和 collections

上述改动会额外增加一些工作量,但是每一个改动都会产生如下优点:

  • 更少的代码量
  • 更清晰的代码、更简洁的代码
  • 更方便的调试

旧版本

2019年7月7日版本

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值