将 TensorFlow 1.X 的代码转换为 TensorFlow 2.0

最新推荐文章于 2025-08-30 08:41:42 发布

翻译最新推荐文章于 2025-08-30 08:41:42 发布 · 2.4w 阅读

187 ·

CC 4.0 BY-SA版权

原文链接：https://www.tensorflow.org/beta/guide/migration_guide

文章标签：

#翻译 #TensorFlow #Python #人工智能 #机器学习

翻译同时被 3 个专栏收录

142 篇文章

订阅专栏

人工智能

29 篇文章

订阅专栏

教程

24 篇文章

订阅专栏

本文详细介绍了如何将TensorFlow 1.x代码迁移到TensorFlow 2.0，涵盖了自动转换脚本的使用、面向对象编程风格的转换、模型训练流程的更新以及Estimators的使用方法，帮助开发者快速掌握新版本的特性和最佳实践。

部署运行你感兴趣的模型镜像

最新版本（2020年3月31日）：https://blog.csdn.net/xovee/article/details/104766718

原文：Convert Your Existing Code to TensorFlow 2.0
译者：Xovee
许可：Creative Commons Attribution 4.0 License
翻译时间：2019年7月7日
适用版本：TensorFlow 2.0 Beta

正文

在 TensorFlow 2.0 中，你的旧代码也许还可以运行（除了contrib）：

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

但是这样并不能让你享受到 TensorFlow 2.0 所带来的各种提升。本篇指南将帮助你升级旧的代码，让代码变得更为简单、性能变得更为强大，维护起来更为轻松！

自动转换脚本

转换的第一步是运行升级脚本。

这是将你的代码升级为 TensorFlow 2.0 的第一步，但这并不能让你的代码符合 TensorFlow 2.0 的惯用法。你的代码也许仍旧需要用到tf.compat.v1，执行各种存在于 TensorFlow 1.x 版本中的许多功能，例如placeholders、sessions、collections等等。

原生 2.0 代码

本篇指南将用好几个例子来一一说明如何将 TensorFlow 1.x 的代码转换为 TensorFlow 2.0。这些改变将使你的代码享受到更快的优化表现，以及简化的API调用。

具体如下：

1. 替换`tf.Session.run`调用

每一个tf.Session.run调用应该被替换为一个 Python 函数。

feed_dict和tf.placeholder变成函数的参数。
fetches变成函数的返回值。

你现在可以用标准的 Python 工具，例如pdb 来逐句调试函数。

当你完成这一步以后，使用tf.function装饰器来让函数在图模式（graph mode）下运行的更有效率。参阅自动图指南（Autograph Guide）来获取更多细节。

2. 使用 python 对象来跟踪变量（variable）和损失（loss）

使用tf.Variable来替换tf.get_variable。

每一个variable_scope可以被转变为一个 Python 对象。具体来说是以下三者之一：

tf.keras.layers.Layer
tf.keras.Model
tf.Module

如果你需要聚集许多变量（就像tf.Graph.get_collection(tf.GraphKeys.VARIABLES)一样），请使用Layer和Model对象的.variable和.trainable_variables属性。

这些Layer和Model类具有许多其他的特性，从而使全局的collections不再需要。你可以使用.losses属性来代替tf.GraphKeys.LOSSES集合。

更多的细节请参阅 keras 指南。

警告：许多 tf.compat.v1 标志隐含地使用了全局集合（global collections）。

3. 升级你的训练循环

根据你的需要，使用最高级别的 API。我们推荐你使用tf.keras.Model.fit来构建你的训练循环。

高级别的函数隐藏了很多以前你需要自己创建的细节。例如，它们可以自动地管理 losses，在调用模型的时候，设置参数training=True。

4. 升级你的数据输入管道（pipelines）

使用tf.data来进行数据集的输入。这些对象使用起来非常地方便快捷，与 tensorflow 结合的很好。

它们可以直接输入到tf.keras.Model.fit方法中：

model.fit(dataset, epochs=5)

你可以直接使用标准的 Python 语法来遍历它们：

for example_batch, label_batch in dataset:
	break

转换模型

设置

from __future__ import absolute_import, division, print_function, unicode_literals
!pip install -q tensorflow==2.0.0-beta1
import tensorflow as tf

import tensorflow_datasets as tfds

低阶的变量和运算操作

一些低阶的 API 包括：

使用variable_scope来管理重用
使用tf.get_variable来创建变量
直接访问collections
使用如下的方法来间接访问collections：
- tf.global_variables
- tf.losses.get_regularization_loss
使用tf.placeholder来设置图的输入
使用session.run来执行图
手动初始化变量

转换前

下面的代码是传统的未转换前的 TensorFlow 1.x 代码：

in_a = tf.placeholder(dtype=tf.float32, shape=(2))
in_b = tf.placeholder(dtype=tf.float32, shape=(2))

def forward(x):
	with tf.variable_scope("matmul", reuse=tf.AUTO_REUSE):
		W = tf.get_variable("W", initializer=tf.ones(shape=(2,2)), 
							regularizer=tf.contrib.layers.l2_regularizer(0.04))
		b = tf.get_variable("b", initializer=tf.zeros(shape(2)))
		return W * x + b

out_a = forward(in_a)
out_b = forward(in_b)

reg_loss = tf.losses.get_regularization_loss(scope="matmul")

with tf.Session() as sess:
	sess.run(tf.global_variables_initializer())
	outs = sess.run([out_a, out_b, reg_loss], 
					feed_dict={in_a: [1, 0], in_b: [0, 1]})

转换后

在转换后的代码中：

变量是局部 Python 对象
foward函数仍旧定义着计算
sess.run
可选的tf.function装饰器可以提升代码执行的性能
regularizations可以手动地计算，不需要引用任何全局collection
没有sessions！也没有placeholders！

W = tf.Variable(tf.ones(shape=(2,2)), name="W")
b = tf.Variable(tf.zeros(shape=(2)), name="b")

@tf.function
def forward(x):
	return W * x + b

out_a = forward([1, 0])
print(out_a)

tf.Tensor(
[[1.  0.]
 [1.  0.]], shape=(2, 2), dtype=float32)

out_b = forward([0,1])

regularizer = tf.regularizers.l2(0.04)
reg_loss = regularizer(W)

基于`tf.layers`的模型

tf.layers模块内包含了各种层级别的函数，它基于tf.variable_scope来定义和重用变量。

转换前

def model(x, training, scope='model'):
	with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):
		x = tf.layers.conv2d(x, 32, 3, activation=tf.nn.relu, 
							 kernel_regularizer=tf.contrib.layers.l2_regularizer(0.04))
		x = tf.layers.max_pooling2d(x, (2, 2), 1)
		x = tf.layers.flatten(x)
		x = tf.layers.dropout(x, 0.1, training=training)
		x = tf.layers.dense(x, 64, activation=tf.nn.relu)
		x = tf.layers.batch_normalization(x, training=training)
		x = tf.layers.dense(x, 10, activation=tf.nn.softmax)
		return x

train_out = model(train_data, training=True)
test_out = model(test_data, training=False)

转换后

简单的层的序列可以使用tf.keras.Sequential来优雅地实现。（对于更复杂的模型，参考自定义层和模型，以及功能 API（functional API）
模型会跟踪变量和损失
代码的转换是一对一的，从tf.layers到tf.keras.layers

大多数的变量都没有发生改变，不过注意以下几个变化：

在模型运行的时候，training参数会被传递到各个层中
原始的model函数的第一个参数x已经不需要了。因为构建层和调用层现在是分开的。

以及：

如果你以前使用tf.contrib中的正则化和初始化参数，相比于其他参数，它们有着更多的变化
代码已经不再包含collections了，所以像tf.losses.get_regularization_loss之类的函数已经不再发挥作用了

model = tf.keras.Sequential([
	tf.keras.layers.Conv2D(32, 3, activation='relu', 
						   kernel_regularizer=tf.keras.regularizers.l2(0.04), 
						   input_shape=(28, 28, 1)), 
	tf.keras.layers.MaxPooling2D(), 
	tf.keras.layers.Flatten(), 
	tf.keras.layers.Dropout(0.1), 
	tf.keras.layers.Dense(64, activation='relu'), 
	tf.keras.layers.BatchNormalization(), 
	tf.keras.layers.Dense(10, activation='softmax')
])

train_data = tf.ones(shape=(1, 28, 28, 1))
test_data = tf.ones(shape=(1, 28, 28, 1))

train_out = model(train_data, training=True)
print(train_out)

tf.Tensor([[0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1]], shape=(1, 10), dtype=float32)

test_out = model(test_data, training=False)
print(test_out)

tf.Tensor(
[[0.05971449 0.06920086 0.08718181 0.06443588 0.12245757 0.08628429
  0.09619197 0.12731078 0.10284208 0.18438028]], shape=(1, 10), dtype=float32)

# 这里是所有的可训练的变量数
len(model.trainable_variables)

# 这里是模型的损失
model.losses

[<tf.Tensor: id=400, shape=(), dtype=float32, numpy=0.07110846>]

混合的变量和`tf.layers`

旧的代码经常把低阶的 TF 1.x 变量和操作与高阶的tf.layers混用。

转换前

def model(x, training, scope='model'):
	with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):
		W = tf.get_variable(
			"W", dtype=tf.float32, 
			initializer=tf.ones(shape=x.shape), 
			regularizer=tf.contrib.layers.l2_regularizer(0.04), 
			trainable=True)
		if training:
			x = x + W
		else:
			x = x + W * 0.5
		x = tf.layers.conv2d(x, 32, 3, activation=tf.nn.relu)
		x = tf.layers.max_pooling2d(x, (2, 2), 1)
		x = tf.layers.flatten(x)
		return x

train_out = model(train_data, training=True)
test_out = model(test_data, training=True)

转换后

我们依旧可以像以前一样，一层一层地编写模型。

现在每个层都自带了tf.variable_scope属性。所以把它改写为tf.keras.layers.Layer。更多的细节请参考这个指南。

有如下几个要点：

在__init__中收集层的参数
在build中构建变量
在call中执行计算操作，然后返回结果

# 创建一个自定义的层
class CustomLayer(tf.keras.layers.Layer):
	def __init__(self, *args, **kwargs):
		super(CustomLayer, self).__init__(*args, **kwargs)
	
	def build(self, input_shape):
		self.w = self.add_weight(
			shape=input_shape[1:], 
			dtype=tf.float32, 
			initializer=tf.keras.initializers.ones(), 
			regularizer=tf.keras.regularizers.l2(0.02), 
			trainable=True)

	# Call 方法有时候会在图模式下使用
	@tf.function
	def call(self, inputs, training=None):
		if training:
			return inputs + self.w
		else:
			return inputs + self.w * 0.5

custom_layer = CustomLayer()
print(custom_layer([1]).numpy())
print(custom_layer([1], training=True).numpy())

[1.5]
[2.]

train_data = tf.ones(shape=(1, 28, 28, 1))
test_data = tf.ones(shape=(1, 28, 28, 1))

# 构建包含自定义层的模型
model = tf.keras.Sequential([
	CustomLayer(input_shape=(28, 28, 1)), 
	tf.keras.layers.Conv2D(32, 3, activation='relu'), 
	tf.keras.layers.MaxPooling2D(), 
	tf.keras.layers.Flatten(), 
])

train_out = model(train_data, training=True)
test_out = model(test_data, training=False)

一些要注意的事项：

如果要在 v1 图或者是 eager 模式下运行 Keras 模型或图的子类的话，你需要
- 用tf.function()来装饰call()，从而获得自动图和自动控制依赖
不要忘了call的training参数
- 有时候它是一个tf.Tensor
- 有时候它是一个 Python Boolean（True or False）
不要在你的对象中使用tf.Tensors
- 它们有可能由tf.function创建，也有可能由 eager 创建，它们的行为有可能不同
- 使用tf.Variable，它在各种情况下都适用
- tf.Tensors应只用来创建中间变量

有关`Slim`和`contrib.layers`

有很多旧的 TensorFlow 1.x 代码使用了 Slim 库，它被打包在 TensorFlow 1.x 的tf.contrib.layers中。作为一个contrib模块，就算有tf.compat.v1，它在 TensorFlow 2.0 中也已经不再可用了。转换使用了Slim的代码要更为困难一些。事实上，你最好先把你的Slim代码转换为tf.layers，然后再把它转换为Keras。

移除arg_scopes，所有的参数都需要显式地定义
如果你使用了normalizer_fn和activation_fn，把它们移到各自的层里
把可分离的 conv layers 映射到一个或多个不同的 Keras layers（depthwise、pointwise、separable Keras layers）
Slim 和tf.layers有着不同的参数名和不同的默认值
一些参数有着不同的范围
如果你使用Slim预训练模型，试试tf.keras.applications或者 TFHub

有一些tf.contrib层并没有添加到核心 TensorFlow 中，而是转移到了 TF add-ons package。

训练

这里有许多种方法可以把数据输入到tf.keras模型中。例如 Python 生成器或者 Numpy 数组等。

我们推荐使用的方法是tf.data包，其中包含了许多高性能的操纵数据的方法。

如果你仍在使用tf.queue，它们只支持数据结构（data-structures），而不是输入管道（input pipelines）。

使用数据集

TensorFlow Datasets 包（tfds）囊括了许多加载预先定义的数据集的工具，例如，用tfds来加载 MNIST 数据集：

datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True)
mnist_train, mnist_test = datasets['train'], datasets['test']


Downloading and preparing dataset mnist (11.06 MiB) to /home/kbuilder/tensorflow_datasets/mnist/1.0.0...

HBox(children=(IntProgress(value=1, bar_style='info', description='Dl Completed...', max=1, style=ProgressStyl…
HBox(children=(IntProgress(value=1, bar_style='info', description='Dl Size...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=1, bar_style='info', description='Extraction completed...', max=1, style=Prog…
/home/kbuilder/.local/lib/python3.5/site-packages/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)
/home/kbuilder/.local/lib/python3.5/site-packages/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)
/home/kbuilder/.local/lib/python3.5/site-packages/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)
/home/kbuilder/.local/lib/python3.5/site-packages/urllib3/connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)






HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


HBox(children=(IntProgress(value=0, description='Shuffling...', max=10, style=ProgressStyle(description_width=…
WARNING: Logging before flag parsing goes to stderr.
W0622 00:56:22.375359 139656236431104 deprecation.py:323] From /home/kbuilder/.local/lib/python3.5/site-packages/tensorflow_datasets/core/file_format_adapter.py:209: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`

HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=6000, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=6000, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=6000, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=6000, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=6000, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=6000, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=6000, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=6000, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=6000, style=ProgressStyle(description_width=…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=6000, style=ProgressStyle(description_width=…


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


HBox(children=(IntProgress(value=0, description='Shuffling...', max=1, style=ProgressStyle(description_width='…
HBox(children=(IntProgress(value=1, bar_style='info', description='Reading...', max=1, style=ProgressStyle(des…
HBox(children=(IntProgress(value=0, description='Writing...', max=10000, style=ProgressStyle(description_width…
Dataset mnist downloaded and prepared to /home/kbuilder/tensorflow_datasets/mnist/1.0.0. Subsequent calls will reuse this data.

然后处理数据：

重新调整每个图片
随机打乱数据
构造图片和标签的 batch

BUFFER_SIZE = 10 # Use a much larger value for real code.
BATCH_SIZE = 64
NUM_EPOCHS = 5

def scale(image, label):
	image = tf.cast(image, tf.float32)
	image =/ 255
	
	return image, label

为了让这个例子简短一些，我们只返回 5 个 batches：

train_data = mnist_train.map(scale).shuffle(BUFFER_SIZE).batch(BATCH_SIZE).
test_data = mnist_test.map(scale).batch(BATCH_SIZE).take(5)

STEPS_PER_EPOCH = 5

train_data = train_data.take(STEPS_PER_EPOCH)
test_data = test_data.take(STEPS_PER_EPOCH)

image_batch, label_batch = next(iter(train_data))

使用 Keras 训练循环

如果你不需要低阶的训练控制，那么我们推荐你使用 Keras 内置的fit、evaluate 和 predict 方法。这些方法提供了一个统一的接口去训练模型，你不需要去关心实现细节（序列的、函数式的，或者子类的）。

这些方法的优点有：

它们接受 Numpy 数组、Python 生成器和tf.data.Datasets
它们自动地应用 regularization、activation losses
They support arbitrary callables as losses and metrics
它们支持回调函数，例如tf.keras.callbacks.TensorBoard，你也可以自定义回调函数
它们性能很好，自动地支持 TensorFlow graphs

下面是一个使用Dataset来训练模型的例子。（如果你想了解它具体是如何运作的，查看这个教程。）

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu',
                           kernel_regularizer=tf.keras.regularizers.l2(0.02),
                           input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.1),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Model is the full model w/o custom layers
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(train_data, epochs=NUM_EPOCHS)
loss, acc = model.evaluate(test_data)

print("Loss {}, Accuracy {}".format(loss, acc))

W0622 00:56:27.806562 139656236431104 deprecation.py:323] From /tmpfs/src/tf_docs_env/lib/python3.5/site-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where

Epoch 1/5
5/5 [==============================] - 1s 240ms/step - loss: 1.6412 - accuracy: 0.4875
Epoch 2/5
5/5 [==============================] - 0s 23ms/step - loss: 0.5723 - accuracy: 0.8406
Epoch 3/5
5/5 [==============================] - 0s 26ms/step - loss: 0.4360 - accuracy: 0.9156
Epoch 4/5
5/5 [==============================] - 0s 27ms/step - loss: 0.3099 - accuracy: 0.9438
Epoch 5/5
5/5 [==============================] - 0s 23ms/step - loss: 0.2429 - accuracy: 0.9594
      5/Unknown - 0s 45ms/step - loss: 1.6050 - accuracy: 0.6906Loss 1.6050187349319458, Accuracy 0.690625011920929

编写你自己的循环

如果 Keras 模型的训练步骤不能够满足你的需求，那么你可以考虑使用tf.keras.model.train_on_batch方法来构建自定义的数据迭代循环。

记住：你可以在tf.keras.Callback中实现很多功能。

这个方法有着许多之前章节提到的方法的优点，并且带给你更多的自由。

你也可以使用tf.keras.model.test_on_batch或者tf.keras.Model.evaluate来在训练的时候检查模型的表现。

提示：train_on_batch和test_on_batch方法默认返回一个 batch 上的 loss 和 metrics。如果你设定了reset_metrics=False，那么它们会返回累积的 metrics，所以你需要恰当地重置 metric accumulators。以及，有些 metrics，例如AUC，你需要设置reset_metrics=False来让其正确地计算。

我们继续训练上述的模型：

# Model is the full model w/o custom layers
model.compile(optimizer='adam', 
			  loss='sparse_categorical_crossentropy', 
			  metrics=['accuracy'])
metrics_names = model.metrics_names

for epoch in range(NUM_EPOCHS):
	# 重置 metric accumulators
	model.reset_metrics()
	for image_batch, label_batch in train_data:
		result = model.train_on_batch(image_batch, label_batch)
		print("train: ", 
			  "{}: {:.3f}".format(metrics_names[0], result[0]), 
			  "{}: {:.3f}".format(metrics_names[1], result[1]))
	for image_batch, label_batch in test_data:
		result = model.test_on_batch(image_batch, label_batch, 
									 # 返回累积的 metrics
									 reset_metrics=False)
	print("\neval: ", 
		  "{}: {:.3f}".format(metrics_names[0], result[0]), 
		  "{}: {:.3f}".format(metrics_names[1], result[1]))

train:  loss: 0.232 accuracy: 0.984
train:  loss: 0.229 accuracy: 0.984
train:  loss: 0.282 accuracy: 0.922
train:  loss: 0.219 accuracy: 1.000
train:  loss: 0.368 accuracy: 0.906

eval:  loss: 1.589 accuracy: 0.738
train:  loss: 0.113 accuracy: 1.000
train:  loss: 0.118 accuracy: 1.000
train:  loss: 0.116 accuracy: 1.000
train:  loss: 0.155 accuracy: 1.000
train:  loss: 0.231 accuracy: 0.969

eval:  loss: 1.574 accuracy: 0.769
train:  loss: 0.101 accuracy: 1.000
train:  loss: 0.098 accuracy: 1.000
train:  loss: 0.105 accuracy: 1.000
train:  loss: 0.114 accuracy: 1.000
train:  loss: 0.128 accuracy: 1.000

eval:  loss: 1.561 accuracy: 0.803
train:  loss: 0.090 accuracy: 1.000
train:  loss: 0.067 accuracy: 1.000
train:  loss: 0.084 accuracy: 1.000
train:  loss: 0.082 accuracy: 1.000
train:  loss: 0.098 accuracy: 0.984

eval:  loss: 1.548 accuracy: 0.822
train:  loss: 0.066 accuracy: 1.000
train:  loss: 0.069 accuracy: 1.000
train:  loss: 0.068 accuracy: 1.000
train:  loss: 0.087 accuracy: 1.000
train:  loss: 0.163 accuracy: 0.953

eval:  loss: 1.551 accuracy: 0.825

自定义训练步骤

你可以通过自定义循环来实现更为灵活的控制。
跟随以下三个步骤：

通过一个 Python generator 或者tf.data.Dataset来获取 batches
使用tf.GradientTape来获得梯度
使用tf.keras.optimizer来进行变量的权值更新

记住：

在使用子类层和模型的call方法时，一定要记得设置training参数
在调用模型的时候，保证training参数设置的正确性
根据具体的情况，模型的变量在模型运行之前也许并不存在
你需要手动地管理模型的 regularization losses

以及一些对 v1 版本的简化变动：

你不需要使用变量的初始化器（initializer）。变量在创建的时候就已经完成了初始化
你不需要增加手动的控制依赖（control dependencies）。就算在tf.function中，各类操作也表现的像在 eager 模式中一样。

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu',
                           kernel_regularizer=tf.keras.regularizers.l2(0.02),
                           input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.1),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(10, activation='softmax')
])

optimizer = tf.keras.optimizers.Adam(0.001)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()

@tf.function
def train_step(inputs, labels):
	with tf.GradientTape() as tape:
		predictions = model(inputs, training=True)
		regularization_loss = tf.math.add_n(model.losses)
		pred_loss = loss_fn(labels, predictions)
		total_loss = pred_loss + regularization_loss

	gradients = tape.gradient(total_loss, model.trainable_variables)
	optimizer.apply_gradients(zip(gradients, model.trainable_variables))

for epoch in range(NUM_EPOCHS):
	for inputs, labels in train_data:
		train_step(inputs, labels)
	print("Finished epoch", epoch)

Finished epoch 0
Finished epoch 1
Finished epoch 2
Finished epoch 3
Finished epoch 4

Metrics

在 TensorFlow 2.0 中，metrics 是一种对象。Metric 对象既可以在 eager 模式中运行，也可以在tf.function中运行。一个 metric 对象有着以下一些方法：

update_state()——增加新的 observations
result()——在给定观察值的情况下，获得 metric 最近的结果（current result）
reset_states()——清除所有的 observations

对象本身是可以调用的。调用更新观察值的新的状态（update_state），然后返回 metric 新的结果。

你不需要手动去初始化 metric 的变量，因为 TensorFlow 2.0 会自动地控制依赖，你不需要担心这些。

下面的代码使用了一个 metric 来跟踪自定义训练循环下的平均 loss。

# 创建 metrics
loss_metric = tf.keras.metrics.Mean(name='train_loss')
accuracy_metric = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')

@tf.function
def train_step(inputs, labels):
	with tf.GradientTape() as tape:
		predictions = model(inputs, training=True)
		regularization_loss = tf.math.add_n(model.losses)
		pred_loss = loss_fn(labels, predictions)
		total_loss = pred_loss + regularization_loss

	gradients = tape.gradient(total_loss, model.trainable_variables)
	optimizer.apply_gradients(zip(gradients, model.trainable_variables))
	# 更新 metrics
	loss_metric.update_state(total_loss)
	accuracy_metric.update_state(labels, predictions)

for epoch in range(NUM_EPOCHS):
	# 重置 metrics
	loss_metric.reset_states()
	accuracy_metric.reset_states()

	for inputs, labels in train_data:
		train_step(inputs, labels)
	# 获取 metric 的结果
	mean_loss = loss_metric.result()
	mean_accuracy = accuracy_metric.result()

	print('Epoch: ', epoch)
	print('  loss: 		{:.3f}'.format(mean_loss))
	print('  accuracy:  {:.3f}'.format(mean_accuracy())

Epoch:  0
  loss:     0.224
  accuracy: 0.981
Epoch:  1
  loss:     0.187
  accuracy: 0.984
Epoch:  2
  loss:     0.150
  accuracy: 0.997
Epoch:  3
  loss:     0.132
  accuracy: 1.000
Epoch:  4
  loss:     0.125
  accuracy: 0.991

保存和加载

检查点 Checkpoint

TensorFlow 2.0 使用的是基于对象的 Checkpoint。

旧版本的 checkpoints 依旧可以加载使用，如果你要转换代码的话，也许一些变量名需要进行改变，不过也有变通的方法。

最简单的方法是同步新模型中的名字和 checkpoint 的名字：

你依旧可以给所有的变量设置name参数
Keras 模型也有一个name参数，当作其自有变量的前缀
tf.name_scope函数可以被用来设置变量名的前缀。这与tf.variable_scope非常不同。它只影响名字，而不跟踪变量或者变量的重用。

如果这不符合你的需求，试一试tf.compat.v1.train.init_from_checkpoint函数。它有一个assignment_map参数，可以把旧的名字遍历到新的名字上。

提示：与基于变量的 checkpoints 不同（它可能会让加载变慢，基于名字的 checkpoints 要求所有的变量在函数调用的时候来进行构建。一些模型只有当你调用build方法，或者在一个 batch 的数据上进行训练的时候才会构建变量。

保存

在保存模型方面，一般来说你不需要担心什么。

TensorFlow 1.x 中保存的模型可以在 TensorFlow 2.0 中运行
如果所有的操作都支持的话，TensorFlow 2.0 中保存的模型也可以在 TensorFlow 1.x 中运行

Estimators

使用 Estimators 来进行训练

TensorFlow 2.0 支持 Estimators。

当你使用 estimators，你可以使用 TensorFlow 1.x 中的input_fn()、tf.estimator.TrainSpec和tf.estimator.EvalSpec。

下面是使用input_fn的一个例子。

# Define the estimator's input_fn
def input_fn():
  datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True)
  mnist_train, mnist_test = datasets['train'], datasets['test']

  BUFFER_SIZE = 10000
  BATCH_SIZE = 64

  def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255

    return image, label[..., tf.newaxis]

  train_data = mnist_train.map(scale).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
  return train_data.repeat()

# Define train & eval specs
train_spec = tf.estimator.TrainSpec(input_fn=input_fn,
                                    max_steps=STEPS_PER_EPOCH * NUM_EPOCHS)
eval_spec = tf.estimator.EvalSpec(input_fn=input_fn,
                                  steps=STEPS_PER_EPOCH)

使用 Keras 模型定义

在 TensorFlow 2.0 中，构建 estimators 有几处不同。

我们建议你使用 Keras 来定义你的模型，然后使用tf.keras.model_to_estimator工具来将模型送入到 estimator 中。下面的代码展示了如何使用这个工具来创建和训练一个 estimator。

def make_mode():
	return tf.keras.Sequential([
		tf.keras.layers.Conv2D(32, 3, activation='relu',
                           kernel_regularizer=tf.keras.regularizers.l2(0.02),
                           input_shape=(28, 28, 1)),
    	tf.keras.layers.MaxPooling2D(),
    	tf.keras.layers.Flatten(),
    	tf.keras.layers.Dropout(0.1),
    	tf.keras.layers.Dense(64, activation='relu'),
    	tf.keras.layers.BatchNormalization(),
    	tf.keras.layers.Dense(10, activation='softmax')
    ])

model = make_model()

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

estimator = tf.keras.estimator.model_to_estimator(
  keras_model = model
)

tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

W0622 00:56:36.432056 139656236431104 estimator.py:1811] Using temporary folder as model directory: /tmp/tmppr3a9ssk
W0622 00:56:37.179884 139656236431104 deprecation.py:323] From /tmpfs/src/tf_docs_env/lib/python3.5/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0622 00:56:41.466270 139656236431104 deprecation.py:323] From /tmpfs/src/tf_docs_env/lib/python3.5/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.

({'accuracy': 0.68125, 'global_step': 25, 'loss': 1.7631041}, [])

使用自定义的`model_fn`

如果你已经有一个需要维护的自定义 estimator model_fn，你可以把你的model_fn转换为 Keras model。

不过，由于兼容性问题，一个自定义的model_fn依旧会在 1.x 的图模型中运行。这意味着既没有 eager execution，也没有自动控制依赖。

只需要很少的步骤就可以自定义 model_fn

为了让你的自定义model_fn在 TF 2.0 中运行，如果你只想使用代价最小的办法，那么你可以使用类似于tf.compat.v1的符号，例如optimizers和metrics。

在自定义的model_fn下使用 Keras models 的步骤与自定义训练循环相似：

在mode参数下，将training状态合理地设置
显式地将模型的trainable_variables传递给 optimizer

但是其也有与自定义循环明显不一样的地方：

不要直接使用model.losses，而是使用tf.keras.Model.get_losses_for
使用tf.keras.Model.get_updates_for来抽取模型的更新

提示："Updates"与之前有一些不同，现在它需要在每次 batch 之后进行应用。For example, the moving averages of the mean and variance in a tf.keras.layers.BatchNormalization layer.

def my_model_fn(features, labels, mode):
  model = make_model()

  optimizer = tf.compat.v1.train.AdamOptimizer()
  loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()

  training = (mode == tf.estimator.ModeKeys.TRAIN)
  predictions = model(features, training=training)

  reg_losses = model.get_losses_for(None) + model.get_losses_for(features)
  total_loss = loss_fn(labels, predictions) + tf.math.add_n(reg_losses)

  accuracy = tf.compat.v1.metrics.accuracy(labels=labels,
                                           predictions=tf.math.argmax(predictions, axis=1),
                                           name='acc_op')

  update_ops = model.get_updates_for(None) + model.get_updates_for(features)
  minimize_op = optimizer.minimize(
      total_loss,
      var_list=model.trainable_variables,
      global_step=tf.compat.v1.train.get_or_create_global_step())
  train_op = tf.group(minimize_op, update_ops)

  return tf.estimator.EstimatorSpec(
    mode=mode,
    predictions=predictions,
    loss=total_loss,
    train_op=train_op, eval_metric_ops={'accuracy': accuracy})

# Create the Estimator & Train
estimator = tf.estimator.Estimator(model_fn=my_model_fn)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

W0622 00:56:42.734743 139656236431104 estimator.py:1811] Using temporary folder as model directory: /tmp/tmpiiqepq8n

({'accuracy': 0.64375, 'global_step': 25, 'loss': 1.5705757}, [])

使用 TF 2.0 符号来自定义`model_fn`

如果你不想再使用任何 TF 1.x 的符号，而是把你的自定义model_fn升级到原生的 TF 2.0，那么你需要把 optimizer 和 metrics 更新为tf.keras.optimizers和tf.keras.metrics。

在一个自定义的model_fn中，除了上述的变化之外，还需要做一些改动：

使用tf.keras.optimizers而不是tf.compat.v1.train.Optimizer
显式地把模型的trainable_variables传递给tf.keras.optimizers
为了计算train_op/minimize_op，
- 如果 loss 是一个 scalar loss Tensor（不可调用），那么使用optimizer.get_updates()。在返回的列表中，第一个元素是train_op/minimize_op
- 如果 loss 是可以调用的（例如是一个函数），那么使用optimizer.minimize()来获取train_op/minimize_op
使用tf.keras.metrics而不是tf.compat.v1.metrics

以上述的my_model_fn为例，变动后的代码如下所示：

def my_model_fn(features, labels, mode):
  model = make_model()

  training = (mode == tf.estimator.ModeKeys.TRAIN)
  loss_obj = tf.keras.losses.SparseCategoricalCrossentropy()
  predictions = model(features, training=training)

  # Get both the unconditional losses (the None part)
  # and the input-conditional losses (the features part).
  reg_losses = model.get_losses_for(None) + model.get_losses_for(features)
  total_loss = loss_obj(labels, predictions) + tf.math.add_n(reg_losses)

  # Upgrade to tf.keras.metrics.
  accuracy_obj = tf.keras.metrics.Accuracy(name='acc_obj')
  accuracy = accuracy_obj.update_state(
      y_true=labels, y_pred=tf.math.argmax(predictions, axis=1))

  train_op = None
  if training:
    # Upgrade to tf.keras.optimizers.
    optimizer = tf.keras.optimizers.Adam()
    # Manually assign tf.compat.v1.global_step variable to optimizer.iterations
    # to make tf.compat.v1.train.global_step increased correctly.
    # This assignment is a must for any `tf.train.SessionRunHook` specified in
    # estimator, as SessionRunHooks rely on global step.
    optimizer.iterations = tf.compat.v1.train.get_or_create_global_step()
    # Get both the unconditional updates (the None part)
    # and the input-conditional updates (the features part).
    update_ops = model.get_updates_for(None) + model.get_updates_for(features)
    # Compute the minimize_op.
    minimize_op = optimizer.get_updates(
        total_loss,
        model.trainable_variables)[0]
    train_op = tf.group(minimize_op, *update_ops)

  return tf.estimator.EstimatorSpec(
    mode=mode,
    predictions=predictions,
    loss=total_loss,
    train_op=train_op,
    eval_metric_ops={'Accuracy': accuracy_obj})

# Create the Estimator & Train.
estimator = tf.estimator.Estimator(model_fn=my_model_fn)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

W0622 00:56:48.755105 139656236431104 estimator.py:1811] Using temporary folder as model directory: /tmp/tmpd_1tlnyx

({'Accuracy': 0.690625, 'global_step': 25, 'loss': 1.6694409}, [])

Premade Estimators

Premade Estimators 是tf.estimator.DNN*家族中的一员，在 TensorFlow 2.0 的 API 中，tf.estimator.Linear*和tf.estimator.DNNLinearCombined*依旧可用，不过有一些参数发生了改变：

input_layer_partitioner：已移除
loss_reduction：使用tf.keras.losses.Reduction而不是tf.compat.v1.losses.Reduction。其默认的值已经从tf.compat.v1.losses.Reduction.SUM变化为tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE
optimizer、dnn_optimizer和linear_optimizer：这些参数已经从tf.compat.v1.train.Optimizer更新为tf.keras.optimizers

为了适应上面所列举的变化：1. 你无需改变input_layer_partitioner，因为Distribution Strategy会自动解决这个问题；2. 对于loss_reduction，检查tf.keras.losses.Reduction来获取更多支持选项；3. 对于optimizer的参数，或者你已经在代码中指定optimizer的参数为string，你不需要做任何改动。tf.keras.optimizers会默认使用。如果不是的话，你需要将其从tf.compat.v1.train.Optimizer更新为相应的tf.keras.optimizers。

Checkpoint 转换器

optimizer的转换会破坏 TF 1.x 中的 checkpoints，因为tf.keras.optimizers会生成不同的参数组来存储到 checkpoints 中。为了让 checkpoint 在 TF 2.0 中也可以使用，请使用 checkpoint 转换器工具来将 TF 1.x 的 checkpoints 转换到 TF 2.0。转换后的 checkpoints 可以在 TF 2.0 环境下恢复预训练模型。

TensorShape

这个类已经被简化为int，而不是tf.compat.v1.Dimension对象。所以你不再需要调用.value()来获取一个int。

单独的tf.compat.v1.Dimension对象依旧可以使用tf.TensorShape.dims来访问。

下面的代码展示了 TensorFlow 1.x 和 TensorFlow 2.0 的不同。

# 创建一个 shape，并且选择一个 index
i = 0
shape = tf.TensorShape([16, None, 256])
shape

TensorShape([16, None, 256])

如果你的 TF 1.x 代码是这样的：

value = shape[i].value

把它转换为：

value = shape[i]
value

如果你的 TF 1.x 代码是这样的：

for dim in shape:
	value = dim.value
	print(value)

把它转换为：

for value in shape:
	print(value)

16
None
256

如果你的 TF 1.x 代码是这样的（或者使用了其它的 dimension 方法）：

dim = shape[i]
dim.assert_is_compatible_with(other_dim)

把它转换为：

other_dim = 16
Dimension = tf.compat.v1.Dimension

if shape.rank is None:
	dim = Dimension(None)
else:
	dim = shape.dims[i]
dim.is_compatible_with(other_dim) # 或者其它的 dimension 方法

True

shape = tf.TensorShape(None)

if shape:
	dim = shape.dims[i]
	dim.is_compatible_with(other_dim) # 或者其它的 dimension 方法

如果 rank 已知的话，tf.TensorShape的布尔值是True，未知的话，其值为False。

print(bool(tf.TensorShape([])))      # Scalar
print(bool(tf.TensorShape([0])))     # 0-length vector
print(bool(tf.TensorShape([1])))     # 1-length vector
print(bool(tf.TensorShape([None])))  # Unknown-length vector
print(bool(tf.TensorShape([1, 10, 100])))       # 3D tensor
print(bool(tf.TensorShape([None, None, None]))) # 3D tensor with no known dimensions
print()
print(bool(tf.TensorShape(None)))  # A tensor with unknown rank.