TensorFlow2.0 Guide官方教程 学习笔记22 -‘Migrate your TensorFlow 1 code to TensorFlow 2‘-part2

本笔记参照TensorFlow Guide官方教程,主要是对‘Accelerator-Use a GPU’教程内容翻译和内容结构编排,原文链接:Migrate your TensorFlow 1 code to TensorFlow 2


六、保存和加载

6.1 检查点(checkpoint)兼容性

TensorFlow 2.0 使用基于对象的检查点
如果我们足够细心,仍然可以加载旧式的基于名称的检查点(Old-style name-based checkpoints)。代码转换过程中可能会导致变量名的更改,但是有一些变通方法。
最简单的方法是将新模型的名称与检查点中的名称对齐:
- 变量依然有一个命名参数供我们设置
- Keras模型还采用名称参数作为其变量的前缀。
- name_scope函数可用于设置变量名前缀。这与‘tf.variable_scope’非常不同。它只影响名称,不跟踪变量&重用。
如果上面这些对我们的用例不起作用,那可以尝试下‘v1.train.init_from_checkpoint’函数。它接受‘assignment_map’参数,该参数指定从旧名称到新名称的映射。

注意:不像基于对象的检查点,它可以延迟加载,基于名称的检查点要求在调用函数时构建所有变量。有些模型将构建变量推迟到调用‘build’或在批数据上运行模型之后。

Tensorflow Estimator存储库包含一个转换工具,用于从TensorFlow1.x到2.0中为预制Estimator更新检查点。它可以作为如何在类似的用例中构建工具的一个例子。

6.2 已保存模型兼容性

TensorFlow 1.x的已保存模型可以在2.0版本中工作,2.0的已保存模型甚至可以在1.X版本工作如果所有操作都支持的话。

6.3 ‘Graph.pb’或‘Graph.pbtxt’

没有直接的方法升级原始‘Graph.pb’文件给2.0版本使用,最好的办法是升级生成文件的代码。
但是,如果你有一个‘冻结图’(Frozen graph)(一个tf.Graph,里面的变量已经被转换为常数),然后,可以使用v1.wrap_function将其转换为一个concrete_function:

def wrap_frozen_graph(graph_def, inputs, outputs):
  def _imports_graph_def():
    tf.compat.v1.import_graph_def(graph_def, name="")
  wrapped_import = tf.compat.v1.wrap_function(_imports_graph_def, [])
  import_graph = wrapped_import.graph
  return wrapped_import.prune(
      tf.nest.map_structure(import_graph.as_graph_element, inputs),
      tf.nest.map_structure(import_graph.as_graph_element, outputs))

例如,下面是一个‘Inception v1’的冻结图:

path = tf.keras.utils.get_file(
    'inception_v1_2016_08_28_frozen.pb',
    'http://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz',
    untar=True)
Downloading data from http://storage.googleapis.com/download.tensorflow.org/models/inception_v1_2016_08_28_frozen.pb.tar.gz
24698880/24695710 [==============================] - 0s 0us/step

加载‘tf.GraphDef’:

graph_def = tf.compat.v1.GraphDef()
loaded = graph_def.ParseFromString(open(path,'rb').read())

把它包装成‘concrete_function’:

inception_func = wrap_frozen_graph(
    graph_def, inputs='input:0',
    outputs='InceptionV1/InceptionV1/Mixed_3b/Branch_1/Conv2d_0a_1x1/Relu:0')

给它传递一个张量作为输入:

input_img = tf.ones([1,224,224,3], dtype=tf.float32)
inception_func(input_img).shape
TensorShape([1, 28, 28, 96])

七、评估器(Estimators)

7.1 使用Estimator训练

2.0支持Estimator,当我们使用estimator时,我们可以使用来自1.X版本的‘input_fn()’,‘tf.estimator.TrainSpec’和‘tf.estimator.EvalSpec’
下面是一个使用带有训练和评估规范的input_fn示例:
创建input_fn和训练/评估规范:

# Define the estimator's input_fn
def input_fn():
  datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True)
  mnist_train, mnist_test = datasets['train'], datasets['test']

  BUFFER_SIZE = 10000
  BATCH_SIZE = 64

  def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255

    return image, label[..., tf.newaxis]

  train_data = mnist_train.map(scale).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
  return train_data.repeat()

# Define train & eval specs
train_spec = tf.estimator.TrainSpec(input_fn=input_fn,
                                    max_steps=STEPS_PER_EPOCH * NUM_EPOCHS)
eval_spec = tf.estimator.EvalSpec(input_fn=input_fn,
                                  steps=STEPS_PER_EPOCH)

7.2 使用Keras模型定义

下面是2.0版本里如何创建estimator的几处不同。
谷歌建议我们使用Keras定义模型,然后使用‘tf.keras.estimator.model_to_estimator’将我们的模型转换为一个estimator。下面的代码就是教我们如何使用这个实用(utility),在创建和训练一个estimator的时候。

def make_model():
  return tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu',
                           kernel_regularizer=tf.keras.regularizers.l2(0.02),
                           input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.1),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(10, activation='softmax')
  ])
model = make_model()

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

estimator = tf.keras.estimator.model_to_estimator(
  keras_model = model
)

tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
INFO:tensorflow:Using default config.

INFO:tensorflow:Using default config.

WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp02wxj64o

WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp02wxj64o

INFO:tensorflow:Using the Keras model provided.

INFO:tensorflow:Using the Keras model provided.

WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.

WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.

INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp02wxj64o', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f4137c8b7f0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp02wxj64o', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f4137c8b7f0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Not using Distribute Coordinator.

INFO:tensorflow:Not using Distribute Coordinator.

INFO:tensorflow:Running training and evaluation locally (non-distributed).

INFO:tensorflow:Running training and evaluation locally (non-distributed).

INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.

INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.

WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.

WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmp02wxj64o/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})

INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmp02wxj64o/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})

INFO:tensorflow:Warm-starting from: /tmp/tmp02wxj64o/keras/keras_model.ckpt

INFO:tensorflow:Warm-starting from: /tmp/tmp02wxj64o/keras/keras_model.ckpt

INFO:tensorflow:Warm-starting variables only in TRAINABLE_VARIABLES.

INFO:tensorflow:Warm-starting variables only in TRAINABLE_VARIABLES.

INFO:tensorflow:Warm-started 8 variables.

INFO:tensorflow:Warm-started 8 variables.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp02wxj64o/model.ckpt.

INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp02wxj64o/model.ckpt.

INFO:tensorflow:loss = 2.795739, step = 0

INFO:tensorflow:loss = 2.795739, step = 0

INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmp02wxj64o/model.ckpt.

INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmp02wxj64o/model.ckpt.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Starting evaluation at 2019-11-13T01:32:41Z

INFO:tensorflow:Starting evaluation at 2019-11-13T01:32:41Z

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from /tmp/tmp02wxj64o/model.ckpt-25

INFO:tensorflow:Restoring parameters from /tmp/tmp02wxj64o/model.ckpt-25

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Evaluation [1/5]

INFO:tensorflow:Evaluation [1/5]

INFO:tensorflow:Evaluation [2/5]

INFO:tensorflow:Evaluation [2/5]

INFO:tensorflow:Evaluation [3/5]

INFO:tensorflow:Evaluation [3/5]

INFO:tensorflow:Evaluation [4/5]

INFO:tensorflow:Evaluation [4/5]

INFO:tensorflow:Evaluation [5/5]

INFO:tensorflow:Evaluation [5/5]

INFO:tensorflow:Finished evaluation at 2019-11-13-01:32:42

INFO:tensorflow:Finished evaluation at 2019-11-13-01:32:42

INFO:tensorflow:Saving dict for global step 25: accuracy = 0.734375, global_step = 25, loss = 1.4916394

INFO:tensorflow:Saving dict for global step 25: accuracy = 0.734375, global_step = 25, loss = 1.4916394

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmp02wxj64o/model.ckpt-25

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmp02wxj64o/model.ckpt-25

INFO:tensorflow:Loss for final step: 0.4597902.

INFO:tensorflow:Loss for final step: 0.4597902.

({'accuracy': 0.734375, 'loss': 1.4916394, 'global_step': 25}, [])

7.3 使用自定义‘model_fn’

如果需要维护现有的自定义评估器model_fn,则可以将model_fn转换为使用Keras的模型。
然而,为了兼容性,自定义的‘model_fn’仍可以在1.x风格的图模式中运行。这就是说没有即刻执行和自动控制支撑。

使用最小的更改来自定义model_fn

为了让我们自定义的model_fn在2.0版本中运行,如果想对现有的代码进行最小的更改,可以使用‘tf.compat.v1’标识比如:‘optimizers’和‘metrics’。

在自定义的‘model_fn’里使用Keras模型和在训练循环中使用Keras模型类似:

  • 基于模式参数,恰当地设置训练阶段
  • 显式地传递模型的‘trainable_variables’给optimizer

但是与自定义循环相比,有一些重要的区别:

  • 用‘Model.get_losses_for’来提取代价,而不是‘Model.losses’

  • 使用‘Model.get_update_for’来提取模型更新

    注意:‘更新’是指需要在每个批之后应用到模型里的更改。例如,在一个‘layers.BatchNormalization’层中,均值和方差的移动平均数。

下面的代码从自定义‘model_fn’中创建estimator,说明所有这些问题:

def my_model_fn(features, labels, mode):
  model = make_model()

  optimizer = tf.compat.v1.train.AdamOptimizer()
  loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()

  training = (mode == tf.estimator.ModeKeys.TRAIN)
  predictions = model(features, training=training)

  if mode == tf.estimator.ModeKeys.PREDICT:
    return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

  reg_losses = model.get_losses_for(None) + model.get_losses_for(features)
  total_loss = loss_fn(labels, predictions) + tf.math.add_n(reg_losses)

  accuracy = tf.compat.v1.metrics.accuracy(labels=labels,
                                           predictions=tf.math.argmax(predictions, axis=1),
                                           name='acc_op')

  update_ops = model.get_updates_for(None) + model.get_updates_for(features)
  minimize_op = optimizer.minimize(
      total_loss,
      var_list=model.trainable_variables,
      global_step=tf.compat.v1.train.get_or_create_global_step())
  train_op = tf.group(minimize_op, update_ops)

  return tf.estimator.EstimatorSpec(
    mode=mode,
    predictions=predictions,
    loss=total_loss,
    train_op=train_op, eval_metric_ops={'accuracy': accuracy})

# Create the Estimator & Train
estimator = tf.estimator.Estimator(model_fn=my_model_fn)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
INFO:tensorflow:Using default config.

INFO:tensorflow:Using default config.

WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp99k4r643

WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp99k4r643

INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp99k4r643', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f40781cb128>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp99k4r643', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f40781cb128>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Not using Distribute Coordinator.

INFO:tensorflow:Not using Distribute Coordinator.

INFO:tensorflow:Running training and evaluation locally (non-distributed).

INFO:tensorflow:Running training and evaluation locally (non-distributed).

INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.

INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp99k4r643/model.ckpt.

INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp99k4r643/model.ckpt.

INFO:tensorflow:loss = 2.7229137, step = 0

INFO:tensorflow:loss = 2.7229137, step = 0

INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmp99k4r643/model.ckpt.

INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmp99k4r643/model.ckpt.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Starting evaluation at 2019-11-13T01:32:47Z

INFO:tensorflow:Starting evaluation at 2019-11-13T01:32:47Z

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from /tmp/tmp99k4r643/model.ckpt-25

INFO:tensorflow:Restoring parameters from /tmp/tmp99k4r643/model.ckpt-25

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Evaluation [1/5]

INFO:tensorflow:Evaluation [1/5]

INFO:tensorflow:Evaluation [2/5]

INFO:tensorflow:Evaluation [2/5]

INFO:tensorflow:Evaluation [3/5]

INFO:tensorflow:Evaluation [3/5]

INFO:tensorflow:Evaluation [4/5]

INFO:tensorflow:Evaluation [4/5]

INFO:tensorflow:Evaluation [5/5]

INFO:tensorflow:Evaluation [5/5]

INFO:tensorflow:Finished evaluation at 2019-11-13-01:32:48

INFO:tensorflow:Finished evaluation at 2019-11-13-01:32:48

INFO:tensorflow:Saving dict for global step 25: accuracy = 0.6, global_step = 25, loss = 1.6397461

INFO:tensorflow:Saving dict for global step 25: accuracy = 0.6, global_step = 25, loss = 1.6397461

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmp99k4r643/model.ckpt-25

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmp99k4r643/model.ckpt-25

INFO:tensorflow:Loss for final step: 0.59229.

INFO:tensorflow:Loss for final step: 0.59229.

({'accuracy': 0.6, 'loss': 1.6397461, 'global_step': 25}, [])

用TF2.0标识自动以‘model_fn’
如果想去掉所有的TF 1.x符号,并将我们的自定义model_fn升级到本地TF2.0,我们需要将优化器和指标更新到‘tf.keras.optimizers’和‘tf.keras.metrics’。
在自定义model_fn中,除了上述更改外,还需要进行更多的升级:
- 使用‘tf.keras.optimizers’而不是‘v1.train.Optimizer’
- 显式地传递模型‘trainable_variables’给‘tf.keras.optimizers’
- 为了计算‘train_op/minimize_op’,
(1)如果代价是标量代价张量(不可调用),那么使用 Optimizer.get_updates()。返回列表中的第一个元素是所需的‘train_op/minimize_op’
(2)如果代价可调用(比如函数),使用‘Optimizer.minimize()’来获取‘train_op/minimize_op’
- 使用‘tf.keras.metrics’来评价而不是‘tf.compat.v1.metrics’
对于上面的‘my_model_fn’示例,下面是使用2.0标识的迁移代码:

def my_model_fn(features, labels, mode):
  model = make_model()

  training = (mode == tf.estimator.ModeKeys.TRAIN)
  loss_obj = tf.keras.losses.SparseCategoricalCrossentropy()
  predictions = model(features, training=training)

  # Get both the unconditional losses (the None part)
  # and the input-conditional losses (the features part).
  reg_losses = model.get_losses_for(None) + model.get_losses_for(features)
  total_loss = loss_obj(labels, predictions) + tf.math.add_n(reg_losses)

  # Upgrade to tf.keras.metrics.
  accuracy_obj = tf.keras.metrics.Accuracy(name='acc_obj')
  accuracy = accuracy_obj.update_state(
      y_true=labels, y_pred=tf.math.argmax(predictions, axis=1))

  train_op = None
  if training:
    # Upgrade to tf.keras.optimizers.
    optimizer = tf.keras.optimizers.Adam()
    # Manually assign tf.compat.v1.global_step variable to optimizer.iterations
    # to make tf.compat.v1.train.global_step increased correctly.
    # This assignment is a must for any `tf.train.SessionRunHook` specified in
    # estimator, as SessionRunHooks rely on global step.
    optimizer.iterations = tf.compat.v1.train.get_or_create_global_step()
    # Get both the unconditional updates (the None part)
    # and the input-conditional updates (the features part).
    update_ops = model.get_updates_for(None) + model.get_updates_for(features)
    # Compute the minimize_op.
    minimize_op = optimizer.get_updates(
        total_loss,
        model.trainable_variables)[0]
    train_op = tf.group(minimize_op, *update_ops)

  return tf.estimator.EstimatorSpec(
    mode=mode,
    predictions=predictions,
    loss=total_loss,
    train_op=train_op,
    eval_metric_ops={'Accuracy': accuracy_obj})

# Create the Estimator & Train.
estimator = tf.estimator.Estimator(model_fn=my_model_fn)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
INFO:tensorflow:Using default config.

INFO:tensorflow:Using default config.

WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp14m9uia8

WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmp14m9uia8

INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp14m9uia8', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f402c58c3c8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp14m9uia8', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f402c58c3c8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

INFO:tensorflow:Not using Distribute Coordinator.

INFO:tensorflow:Not using Distribute Coordinator.

INFO:tensorflow:Running training and evaluation locally (non-distributed).

INFO:tensorflow:Running training and evaluation locally (non-distributed).

INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.

INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp14m9uia8/model.ckpt.

INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp14m9uia8/model.ckpt.

INFO:tensorflow:loss = 2.569507, step = 0

INFO:tensorflow:loss = 2.569507, step = 0

INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmp14m9uia8/model.ckpt.

INFO:tensorflow:Saving checkpoints for 25 into /tmp/tmp14m9uia8/model.ckpt.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Starting evaluation at 2019-11-13T01:32:52Z

INFO:tensorflow:Starting evaluation at 2019-11-13T01:32:52Z

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from /tmp/tmp14m9uia8/model.ckpt-25

INFO:tensorflow:Restoring parameters from /tmp/tmp14m9uia8/model.ckpt-25

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Evaluation [1/5]

INFO:tensorflow:Evaluation [1/5]

INFO:tensorflow:Evaluation [2/5]

INFO:tensorflow:Evaluation [2/5]

INFO:tensorflow:Evaluation [3/5]

INFO:tensorflow:Evaluation [3/5]

INFO:tensorflow:Evaluation [4/5]

INFO:tensorflow:Evaluation [4/5]

INFO:tensorflow:Evaluation [5/5]

INFO:tensorflow:Evaluation [5/5]

INFO:tensorflow:Finished evaluation at 2019-11-13-01:32:53

INFO:tensorflow:Finished evaluation at 2019-11-13-01:32:53

INFO:tensorflow:Saving dict for global step 25: Accuracy = 0.721875, global_step = 25, loss = 1.6017154

INFO:tensorflow:Saving dict for global step 25: Accuracy = 0.721875, global_step = 25, loss = 1.6017154

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmp14m9uia8/model.ckpt-25

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 25: /tmp/tmp14m9uia8/model.ckpt-25

INFO:tensorflow:Loss for final step: 0.61313796.

INFO:tensorflow:Loss for final step: 0.61313796.

({'Accuracy': 0.721875, 'loss': 1.6017154, 'global_step': 25}, [])

7.4 预制Estimator

在‘tf.estimator.DNN*’,‘tf.estimator.Linear*’,‘tf.estimator.DNNLinearCombined*’中的预制Estimator仍然在2.0版本中支持,然而,有些参数已经更改了:
1.‘input_layer_partitioner’:2.0中移除
2.‘loss_reduction’:更新‘tf.keras.losses.Reduction’而不是‘tf.compat.v1.losses.Reduction’。它的默认值也从‘tf.compat.v1.losses.Reduction.SUM’更改成了‘tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE’。
3.‘optimizer’,‘dnn_optimizer’和‘linear_optimizer’:这个参数已经被更新到‘tf.keras.optimizers’中,而不是‘tf.compat.v1.train.Optimizer’。

检查点转换器(Checkpoint Converter)
向keras.optimizers的迁移将打破使用TF 1.x保存的检查点,因为tf.keras.optimizers生成一组不同的变量,这些变量将保存在检查点中。要想在迁移到TF2.0 之后使旧的检查点可重用(reusable),我们可以尝试检查点转换工具。

! curl -O https://raw.githubusercontent.com/tensorflow/estimator/master/tensorflow_estimator/python/estimator/tools/checkpoint_converter.py
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 15371  100 15371    0     0  24993      0 --:--:-- --:--:-- --:--:-- 24952

这个工具有帮助信息:

! python checkpoint_converter.py -h
usage: checkpoint_converter.py [-h]
                               {dnn,linear,combined} source_checkpoint
                               source_graph target_checkpoint

positional arguments:
  {dnn,linear,combined}
                        The type of estimator to be converted. So far, the
                        checkpoint converter only supports Canned Estimator.
                        So the allowed types include linear, dnn and combined.
  source_checkpoint     Path to source checkpoint file to be read in.
  source_graph          Path to source graph file to be read in.
  target_checkpoint     Path to checkpoint file to be written out.

optional arguments:
  -h, --help            show this help message and exit

八、TensorShape

这个类被简化为包含ints,而不是‘tf.compat.v1.Dimension’对象。因此,不需要 调用‘.value()’来获得一个整数(int)。
依然可以从‘tf.TensorShape.dims’中获取单独的‘tf.compat.v1.Dimension’对象
下面演示了1.x和2.0版本之间的区别:

# Create a shape and choose an index
i = 0
shape = tf.TensorShape([16, None, 256])
shape
TensorShape([16, None, 256])

如果我们在1.x中有这样的代码:

value = shape[i].value

那么在2.0中该这样:

value = shape[i]
value
16

如果我们在1.x中有这样的代码:

for dim in shape:
    value = dim.value
    print(value)

那2.0中我们该这样:

for value in shape:
  print(value)
16
None
256

如果在1.x中有这样的代码(或者使用了其它任何维度方法):

dim = shape[i]
dim.assert_is_compatible_with(other_dim)

在2.0中我们该这样做:

other_dim = 16
Dimension = tf.compat.v1.Dimension

if shape.rank is None:
  dim = Dimension(None)
else:
  dim = shape.dims[i]
dim.is_compatible_with(other_dim) # or any other dimension method
True
shape = tf.TensorShape(None)

if shape:
  dim = shape.dims[i]
  dim.is_compatible_with(other_dim) # or any other dimension method

如果一个张量的秩是已知的,那‘tf.TensorShape’的布尔值是‘True’,其它为‘False’。

print(bool(tf.TensorShape([])))      # Scalar
print(bool(tf.TensorShape([0])))     # 0-length vector
print(bool(tf.TensorShape([1])))     # 1-length vector
print(bool(tf.TensorShape([None])))  # Unknown-length vector
print(bool(tf.TensorShape([1, 10, 100])))       # 3D tensor
print(bool(tf.TensorShape([None, None, None]))) # 3D tensor with no known dimensions
print()
print(bool(tf.TensorShape(None)))  # A tensor with unknown rank.
True
True
True
True
True
True

False

九、其它更改

  • 移除‘tf.colocate_with’:TensorFlow的设备放置算法已经显著提升了,所以‘tf.colocate_with’不再需要 。如果引起了性能降低,我们可以向谷歌提交bug
  • 使用‘tf.config’替换‘v1.ConfigProto’

十、总结

总体流程如下:
1.运行升级脚本
2.移除建造标识(contrib symbols)
3.将我们的模型转换成面向对象的风格(Keras)
4.在可以的地方使用‘tf.keras’和‘tf.estimator’训练和评价循环
5.否则,使用自定义循环,但是要确认避开会话和集合(sessions&collections)

将代码转换为惯用的TensorFlow 2.0需要做一些工作,但是每个更改都会导致以下结果:
- 更少的代码行
- 更加透明和简单
- 方便调试(debug)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值