tf.contrib.training.train的使用

最新推荐文章于 2022-07-08 10:28:19 发布

xz1308579340

最新推荐文章于 2022-07-08 10:28:19 发布

阅读量3.7k

点赞数 2

分类专栏： tensorflow 文章标签： tf.contrib.training.train

本文链接：https://blog.csdn.net/xz1308579340/article/details/90233428

版权

tensorflow 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

文章目录

源码： https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/contrib/training/python/training/training.py

1.作用

简化使用tensorflow进行训练的过程，减少代码
训练过程进行高度封装
缺点：有点难理解
优点：高度集成，理解后很方便使用，可以很快搭建训练训练，不用一层一层写if

包含各种训练模型的例程和辅助功能。此脚本包含用于培训模型的各种功能。这些包括计算梯度、创建一个“train_op”(train_op = 计算loss + 运行反向传播)和一个训练循环函数。训练循环允许用户传入“train_op”并根据用户指定的参数运行优化。

简单来说，我们需要定义一个train_op，这个train_op包含两个部分，如何计算loss和如何进行反向传播，每一步，都会运行train_op里面的内容，进行梯度的计算，和反向传播。所以定义train_op是tf.contrib.training.train的核心

2.使用train的简单的例子

2.1载入数据和创建模型

images, labels = LoadData(...)
predictions = MyModel(images)

2.2定义loss

tf.contrib.losses.log_loss(predictions, labels)total_loss = tf.contrib.losses.get_total_loss()

2.3定义优化器

optimizer=tf.compat.v1.train.MomentumOptimizer(FLAGS.learning_rate,FLAGS.momentum)

2.4创建train_op

train_op = tf.contrib.training.create_train_op(total_loss, 
optimizer)

2.5运行训练过程

Run training.tf.contrib.training.train(train_op, my_log_dir)

3.如何定义train_op

根据以上所说，使用train来训练模型的核心就是创建一个train_op,下面我们讲讲如何创建一个train_op

为了使用’ train '函数，我们需要创建一个train_op:
train_op包含三个部分

(a)计算损失，
(b)应用梯度更新权重和
©返回损失的值。
我们使用tf.contrib.training。create_train_op创建这样一个“train_op”，如下。

train_op = tf.contrib.training.create_train_op(total_loss, 
optimizer)

是不是没听懂，没关系，我们举几个例子。

3.1创建一个train_op，对梯度进行裁剪操作

train_op = tf.contrib.training.create_train_op(    total_loss,    optimizer,    transform_grads_fn=clip_gradient_norms_fn(3))

3.2创建train_op并通过提供来自变量的映射来缩放梯度

 # Create the train_op and scale the gradients by providing a map from variable
  # name (or variable) to a scaling coefficient:
  def transform_grads_fn(grads):
    gradient_multipliers = {
      'conv0/weights': 1.2,
      'fc8/weights': 3.4,
    }
    return tf.contrib.training.multiply_gradients(
            grads, gradient_multipliers)

  train_op = tf.contrib.training.create_train_op(
      total_loss,
      optimizer,
      transform_grads_fn=transform_grads_fn)

4训练过程进行非梯度更新

许多网络使用需要执行一系列操作的模块，比如Batch Norm。在训练过程中进行非梯度更新。tf.contrib.training.create_train_op允许用户传入一个update_ops列表和渐变一起调用更新。
这个我没明白是什么意思，有知道的告诉我一下

  train_op = tf.contrib.training.create_train_op(
      total_loss, optimizer, update_ops)

By default, tf.contrib.training.create_train_op includes all update ops that are
part of the `tf.GraphKeys.UPDATE_OPS` collection. Additionally, the
tf.contrib.layers.batch_norm function adds the moving mean and moving variance
updates to this collection. Consequently, users who want to use
tf.contrib.layers.batch_norm will not need to take any additional steps in order
to have the moving mean and moving variance updates be computed.

However, users with additional, specialized updates can either override the
default update ops or simply add additional update ops to the
`tf.GraphKeys.UPDATE_OPS` collection:

  # Force `create_train_op` to NOT use ANY update_ops:
  train_op = tf.contrib.training.create_train_op(
     total_loss,
     optimizer,
     update_ops=[])

  # Use an alternative set of update ops:
  train_op = tf.contrib.training.create_train_op(
     total_loss,
     optimizer,
     update_ops=my_other_update_ops)

  # Use a set of update ops in addition to the default updates:
  tf.compat.v1.add_to_collection(tf.GraphKeys.UPDATE_OPS, my_update0)
  tf.compat.v1.add_to_collection(tf.GraphKeys.UPDATE_OPS, my_update1)

  train_op = tf.contrib.training.create_train_op(
     total_loss,
     optimizer)

  # Which is the same as:
  train_op = tf.contrib.training.create_train_op(
     total_loss,
     optimizer,
     update_ops=tf.compat.v1.get_collection(tf.GraphKeys.UPDATE_OPS))

5.从指定的checkpoint恢复模型

******************************************
* Initializing a model from a checkpoint *
******************************************

It is common to want to 'warm-start' a model from a pre-trained checkpoint.
One can use a tf.Scaffold and an initializing function to do so.

  ...

  # Create the train_op
  train_op = tf.contrib.training.create_train_op(total_loss, optimizer)

  # Create the initial assignment op
  checkpoint_path = '/path/to/old_model_checkpoint'
  variables_to_restore = tf.contrib.framework.get_model_variables()
  init_fn = tf.contrib.framework.assign_from_checkpoint_fn(
      checkpoint_path, variables_to_restore)

  # Run training.
  scaffold = tf.Scaffold(init_fn=init_fn)
  tf.contrib.training.train(train_op, my_log_dir, scaffold=scaffold)

6.从变量名不匹配的检查点初始化模型

有时，我们希望从变量名与现有模型不匹配的检查点初始化模型时，用户可能希望使用来自acheck point的值初始化新模型，而acheck point的变量名与当前模型的值不匹配。在这种情况下，需要创建一个从检查点变量名到当前模型变量的映射。这只需要对以下代码稍加修改

  ...
  # Creates a model with two variables, var0 and var1
  predictions = MyModel(images)
  ...

  # Create the train_op
  train_op = tf.contrib.training.create_train_op(total_loss, optimizer)

  checkpoint_path = '/path/to/old_model_checkpoint'

  # Create the mapping:
  variables_to_restore = {
      'name_var_0_in_checkpoint':
          tf.contrib.framework.get_unique_variable('var0'),
      'name_var_1_in_checkpoint':
          tf.contrib.framework.get_unique_variable('var1')
  }
  init_fn = tf.contrib.framework.assign_from_checkpoint_fn(
        checkpoint_path, variables_to_restore)
  scaffold = tf.Scaffold(init_fn=init_fn)

  # Run training.
  tf.contrib.training.train(train_op, my_log_dir, scaffold=scaffold)

7.从check point fine tuning模型

Rather than initializing all of the weights of a given model, we sometimes
only want to restore some of the weights from a checkpoint. To do this, one
need only filter those variables to initialize as follows:

  ...

  # Create the train_op
  train_op = tf.contrib.training.create_train_op(total_loss, optimizer)

  checkpoint_path = '/path/to/old_model_checkpoint'

  # Specify the variables to restore via a list of inclusion or exclusion
  # patterns:
  variables_to_restore = tf.contrib.framework.get_variables_to_restore(
      include=["conv"], exclude=["fc8", "fc9])
  # or
  variables_to_restore = tf.contrib.framework.get_variables_to_restore(
      exclude=["conv"])

  init_fn = tf.contrib.framework.assign_from_checkpoint_fn(
      checkpoint_path, variables_to_restore)
  scaffold = tf.Scaffold(init_fn=init_fn)
  
  # Run training.tf.contrib.training.train(train_op, my_log_dir, 
scaffold=scaffold)

8.从内存变量中初始化模型


******************************************************
* Initializing model variables from values in memory *
******************************************************

One may want to initialize the weights of a model from values coming from an
arbitrary source (a text document, matlab file, etc). While this is technically
feasible using assign operations, this strategy results in the values of your
weights being stored in the graph. For large models, this becomes prohibitively
large. However, it's possible to perform this initial assignment without having
to store the values of the initial model in the graph itself by using
placeholders and a feed dictionary:

  ...

  # Create the train_op
  train_op = tf.contrib.training.create_train_op(total_loss, optimizer)

  # Create the mapping from variable names to values:
  var0_initial_value = ReadFromDisk(...)
  var1_initial_value = ReadFromDisk(...)

  var_names_to_values = {
    'var0': var0_initial_value,
    'var1': var1_initial_value,
  }

  init_fn = tf.contrib.framework.assign_from_values_fn(var_names_to_values)
  scaffold = tf.Scaffold(init_fn=init_fn)

  # Run training.
  tf.contrib.training.train(train_op, my_log_dir, scaffold=scaffold)
"""