tf.train.Saver()和tf.train.ExponentialMovingAverage()的理解

最新推荐文章于 2021-06-25 19:00:47 发布

不死谷神

最新推荐文章于 2021-06-25 19:00:47 发布

阅读量496

点赞数 2

分类专栏： tensorflow 文章标签： tensorflow

本文链接：https://blog.csdn.net/qq_29595303/article/details/97621706

版权

tensorflow 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

tf.train.Saver()和 tf.train.ExponentialMovingAverage()是tensorflow的两个类，第一个是用来保存模型和参数的，第二个是使用的滑动平均模型，官方的文档参考这里：滑动平均模型和
模型保存，相关的讲解网上也有很多，我就结合自己的使用讲一下自己的理解和遇到的问题

tf.train.Saver()

v1 = tf.Variable(..., name='v1')
v2 = tf.Variable(..., name='v2')

# Pass the variables as a dict:
saver = tf.compat.v1.train.Saver({'v1': v1, 'v2': v2})

# Or pass them as a list.
saver = tf.compat.v1.train.Saver([v1, v2])
# Passing a list is equivalent to passing a dict with the variable op names
# as keys:
saver = tf.compat.v1.train.Saver({v.op.name: v for v in [v1, v2]})

如上图所示，tf.train.Saver()是一个类，首先要创建一个对象，这里他的__init__方法最关键的是var_list参数，接受的是需要存储或载入的变量，要求是字典或列表，如上图所示，字典的或，字典的value是现在构建好的网络中的实际的变量，而key是保存这个变量所取得名字，对于restore就是根据key索引checkpoint中的值，然后restore到对应的变量中。如果直接输入列表，tf就会自动用变量的名字来作为key

tf.train.ExponentialMovingAverage()

# Create variables.
var0 = tf.Variable(...)
var1 = tf.Variable(...)
# ... use the variables to build a training model...
...
# Create an op that applies the optimizer.  This is what we usually
# would use as a training op.
opt_op = opt.minimize(my_loss, [var0, var1])

# Create an ExponentialMovingAverage object
ema = tf.train.ExponentialMovingAverage(decay=0.9999)

with tf.control_dependencies([opt_op]):
    # Create the shadow variables, and add ops to maintain moving averages
    # of var0 and var1. This also creates an op that will update the moving
    # averages after each training step.  This is what we will use in place
    # of the usual training op.
    training_op = ema.apply([var0, var1])

...train the model by running training_op...

这个类的作用就是针对指定变量，通过公式shadow_variable -= (1 - decay) * (shadow_variable - variable)，来维护了一组影子变量，这个影子变量也不会使用在实际的反向传播中。
操作方式是：

先创建这个类的实例对象，初始化时要指定衰减率decay（必须），还可以初始化num_updates，这个可以默认，也可以自己初始化，一般训练时可以用global-step来初始化，将min(decay, (1 + num_updates) / (10 + num_updates))的值作为实际的decay。
使用apply方法，指定需要滑动平均的变量，并对其进行操作，这个操作一般在每次梯度下降更新完变量之后再操作，所以使用control_dependencies，要自己控制一下操作顺序。
他还有average方法，输入变量var，用来返回var对应的影子变量，返回的影子变量名一般是在var后加\ExponentialMovingAverage，也就是该类维护的影子变量
他的variables_to_restore方法，参数是moving_avg_variables=None，这个方法返回的是一个字典，这个字典一般用作tf.train.Saver().restore()方法的输入参数，来导入之前存储的参数，所以一般的要求是将之前存储的key为影子变量名字的值加载到原变量中，所以格式一般是 ‘影子变量名：原变量’。moving_avg_variables的值为None时，就默认输入trainable_variables和moving_average_variables，输出是这些变量和对应的加后缀的变量名。还有如果有不可训练变量或者对一些操作或者tensor维护的影子变量，也会加载进来。具体可以看下面我注释的源码，还有我的自己的测试代码。

  def variables_to_restore(self, moving_avg_variables=None):
    """Returns a map of names to `Variables` to restore.

    If a variable has a moving average, use the moving average variable name as
    the restore name; otherwise, use the variable name.

    For example,

    ```python
      variables_to_restore = ema.variables_to_restore()
      saver = tf.compat.v1.train.Saver(variables_to_restore)
    ```

    Below is an example of such mapping:

    ```
      conv/batchnorm/gamma/ExponentialMovingAverage: conv/batchnorm/gamma,
      conv_4/conv2d_params/ExponentialMovingAverage: conv_4/conv2d_params,
      global_step: global_step
    ```

    Args:
      moving_avg_variables: a list of variables that require to use of the
        moving average variable name to be restored. If None, it will default to
        variables.moving_average_variables() + variables.trainable_variables()

    Returns:
      A map from restore_names to variables. The restore_name is either the
      original or the moving average version of the variable name, depending
      on whether the variable name is in the `moving_avg_variables`.
    """
    name_map = {}  ##存放最后输出内容的字典
    if moving_avg_variables is None: # 如果没有给输入参数，就默认输入参数是可训练的参数和显示定义了moving_average的变量的原变量
      # Include trainable variables and variables which have been explicitly
      # added to the moving_average_variables collection.
      moving_avg_variables = variables.trainable_variables()
      moving_avg_variables += variables.moving_average_variables()
    # Remove duplicates
    moving_avg_variables = set(moving_avg_variables)
    # Collect all the variables with moving average,
    for v in moving_avg_variables:  #生成字典，这里将average_name的作用是获取v对应的影子变量的名字，而且即使v没有影子变量也可以生成
      name_map[self.average_name(v)] = v
    # Make sure we restore variables without moving averages as well.
    moving_avg_variable_names = set([v.name for v in moving_avg_variables])
    for v in list(set(variables.global_variables())):  ## 对于不可训练的变量，如global_step等，也会加进来，key就是原来的名字，不会加后缀，对于针对一些tensor或者op的影子变量可以在这里加载
      if v.name not in moving_avg_variable_names and v.op.name not in name_map:
        name_map[v.op.name] = v
    return name_map

测试代码块

import tensorflow as tf
v1=tf.Variable(0,dtype=tf.float32,name='v1')  
v2=tf.Variable(1,dtype=tf.float32,name='v2')

v3=tf.add(v1,v2)  #操作，不是变量
step=tf.Variable(0,trainable=False,name='step')  #不可训练变量
ema=tf.train.ExponentialMovingAverage(0.99,step)  #创建实例
ema.apply([v1,v3])  #对变量v1，和操作v3创建影子变量
variables_to_restore=ema.variables_to_restore()

print('不可训练变量：variables_to_restore:',ema.variables_to_restore([step])) 
#输入不可训练变量，输出此变量和对应的加后缀的变量名，即使没有用滑动平均；还有就是所有其他的全局变量，包括影子变量，而且key和value一致

print('默认参数：variables_to_restore:',variables_to_restore)
#默认参数，有没有使用滑动平均，都会加后缀。所以如果已存储的数据中某些变量的确不是滑动平均值，就要自己修改
print('未创建了影子变量：variables_to_restore:',ema.variables_to_restore([v2]))
#未创建影子变量，也会加后缀；而且其他变量都会一致输出，即使用了滑动平均，就会输出两个变量。
print('创建了影子变量：variables_to_restore:',ema.variables_to_restore([v1]))
# 针对默认参数提到的问题可以用这个解决，
print('variables:',tf.trainable_variables())  #输出可训练参数
print('variables_average:',ema.average(v2))  #输出对应的影子变量，这里v2没有,而且输入只能是单个输入，不能是列表
print('variables_average_name:',ema.average_name(v2))  #输出影子变量名字，这里没有也会输出
print('moving_average_variables:',tf.moving_average_variables())  #获取的是创建了因子变量的原变量
print('global_variables:',tf.global_variables())    #全局变量
print([(v.op.name,v.name) for v in tf.global_variables()])

输出结果

不可训练变量：variables_to_restore: {'step/ExponentialMovingAverage': <tf.Variable 'step:0' shape=() dtype=int32_ref>, 'v1/ExponentialMovingAverage': <tf.Variable 'v1/ExponentialMovingAverage:0' shape=() dtype=float32_ref>, 'v1': <tf.Variable 'v1:0' shape=() dtype=float32_ref>, 'v2': <tf.Variable 'v2:0' shape=() dtype=float32_ref>, 'Add/ExponentialMovingAverage': <tf.Variable 'Add/ExponentialMovingAverage:0' shape=() dtype=float32_ref>}
默认参数：variables_to_restore: {'v1/ExponentialMovingAverage': <tf.Variable 'v1:0' shape=() dtype=float32_ref>, 'v2/ExponentialMovingAverage': <tf.Variable 'v2:0' shape=() dtype=float32_ref>, 'step': <tf.Variable 'step:0' shape=() dtype=int32_ref>, 'Add/ExponentialMovingAverage': <tf.Variable 'Add/ExponentialMovingAverage:0' shape=() dtype=float32_ref>}
未创建了影子变量：variables_to_restore: {'v2/ExponentialMovingAverage': <tf.Variable 'v2:0' shape=() dtype=float32_ref>, 'step': <tf.Variable 'step:0' shape=() dtype=int32_ref>, 'v1/ExponentialMovingAverage': <tf.Variable 'v1/ExponentialMovingAverage:0' shape=() dtype=float32_ref>, 'v1': <tf.Variable 'v1:0' shape=() dtype=float32_ref>, 'Add/ExponentialMovingAverage': <tf.Variable 'Add/ExponentialMovingAverage:0' shape=() dtype=float32_ref>}
创建了影子变量：variables_to_restore: {'v1/ExponentialMovingAverage': <tf.Variable 'v1:0' shape=() dtype=float32_ref>, 'step': <tf.Variable 'step:0' shape=() dtype=int32_ref>, 'v2': <tf.Variable 'v2:0' shape=() dtype=float32_ref>, 'Add/ExponentialMovingAverage': <tf.Variable 'Add/ExponentialMovingAverage:0' shape=() dtype=float32_ref>}
variables: [<tf.Variable 'v1:0' shape=() dtype=float32_ref>, <tf.Variable 'v2:0' shape=() dtype=float32_ref>]
variables_average: None
variables_average_name: v2/ExponentialMovingAverage
moving_average_variables: [<tf.Variable 'v1:0' shape=() dtype=float32_ref>]
global_variables: [<tf.Variable 'v1:0' shape=() dtype=float32_ref>, <tf.Variable 'v2:0' shape=() dtype=float32_ref>, <tf.Variable 'step:0' shape=() dtype=int32_ref>, <tf.Variable 'v1/ExponentialMovingAverage:0' shape=() dtype=float32_ref>, <tf.Variable 'Add/ExponentialMovingAverage:0' shape=() dtype=float32_ref>]
[('v1', 'v1:0'), ('v2', 'v2:0'), ('step', 'step:0'), ('v1/ExponentialMovingAverage', 'v1/ExponentialMovingAverage:0'), ('Add/ExponentialMovingAverage', 'Add/ExponentialMovingAverage:0')]