tf.train.Saver()和tf.train.ExponentialMovingAverage()的理解

tf.train.Saver()和 tf.train.ExponentialMovingAverage()是tensorflow的两个类,第一个是用来保存模型和参数的,第二个是使用的滑动平均模型,官方的文档参考这里:滑动平均模型
模型保存
,相关的讲解网上也有很多,我就结合自己的使用讲一下自己的理解和遇到的问题

tf.train.Saver()

v1 = tf.Variable(..., name='v1')
v2 = tf.Variable(..., name='v2')

# Pass the variables as a dict:
saver = tf.compat.v1.train.Saver({'v1': v1, 'v2': v2})

# Or pass them as a list.
saver = tf.compat.v1.train.Saver([v1, v2])
# Passing a list is equivalent to passing a dict with the variable op names
# as keys:
saver = tf.compat.v1.train.Saver({v.op.name: v for v in [v1, v2]})

如上图所示,tf.train.Saver()是一个类,首先要创建一个对象,这里他的__init__方法最关键的是var_list参数,接受的是需要存储或载入的变量,要求是字典或列表,如上图所示,字典的或,字典的value是现在构建好的网络中的实际的变量,而key是保存这个变量所取得名字,对于restore就是根据key索引checkpoint中的值,然后restore到对应的变量中。如果直接输入列表,tf就会自动用变量的名字来作为key

tf.train.ExponentialMovingAverage()

# Create variables.
var0 = tf.Variable(...)
var1 = tf.Variable(...)
# ... use the variables to build a training model...
...
# Create an op that applies the optimizer.  This is what we usually
# would use as a training op.
opt_op = opt.minimize(my_loss, [var0, var1])

# Create an ExponentialMovingAverage object
ema = tf.train.ExponentialMovingAverage(decay=0.9999)

with tf.control_dependencies([opt_op]):
    # Create the shadow variables, and add ops to maintain moving averages
    # of var0 and var1. This also creates an op that will update the moving
    # averages after each training step.  This is what we will use in place
    # of the usual training op.
    training_op = ema.apply([var0, var1])

...train the model by running training_op...

这个类的作用就是针对指定变量,通过公式shadow_variable -= (1 - decay) * (shadow_variable - variable),来维护了一组影子变量,这个影子变量也不会使用在实际的反向传播中。
操作方式是:

  • 先创建这个类的实例对象,初始化时要指定衰减率decay(必须),还可以初始化num_updates,这个可以默认,也可以自己初始化,一般训练时可以用global-step来初始化,将min(decay, (1 + num_updates) / (10 + num_updates))的值作为实际的decay。
  • 使用apply方法,指定需要滑动平均的变量,并对其进行操作,这个操作一般在每次梯度下降更新完变量之后再操作,所以使用control_dependencies,要自己控制一下操作顺序。
  • 他还有average方法,输入变量var,用来返回var对应的影子变量,返回的影子变量名一般是在var后加\ExponentialMovingAverage,也就是该类维护的影子变量
  • 他的variables_to_restore方法,参数是moving_avg_variables=None,这个方法返回的是一个字典,这个字典一般用作tf.train.Saver().restore()方法的输入参数,来导入之前存储的参数,所以一般的要求是将之前存储的key为影子变量名字的值加载到原变量中,所以格式一般是 ‘影子变量名:原变量’。moving_avg_variables的值为None时,就默认输入trainable_variables和moving_average_variables,输出是这些变量和对应的加后缀的变量名。还有如果有不可训练变量或者对一些操作或者tensor维护的影子变量,也会加载进来。具体可以看下面我注释的源码,还有我的自己的测试代码。
  def variables_to_restore(self, moving_avg_variables=None):
    """Returns a map of names to `Variables` to restore.

    If a variable has a moving average, use the moving average variable name as
    the restore name; otherwise, use the variable name.

    For example,

    ```python
      variables_to_restore = ema.variables_to_restore()
      saver = tf.compat.v1.train.Saver(variables_to_restore)
    ```

    Below is an example of such mapping:

    ```
      conv/batchnorm/gamma/ExponentialMovingAverage: conv/batchnorm/gamma,
      conv_4/conv2d_params/ExponentialMovingAverage: conv_4/conv2d_params,
      global_step: global_step
    ```

    Args:
      moving_avg_variables: a list of variables that require to use of the
        moving average variable name to be restored. If None, it will default to
        variables.moving_average_variables() + variables.trainable_variables()

    Returns:
      A map from restore_names to variables. The restore_name is either the
      original or the moving average version of the variable name, depending
      on whether the variable name is in the `moving_avg_variables`.
    """
    name_map = {}  ##存放最后输出内容的字典
    if moving_avg_variables is None: # 如果没有给输入参数,就默认输入参数是可训练的参数和显示定义了moving_average的变量的原变量
      # Include trainable variables and variables which have been explicitly
      # added to the moving_average_variables collection.
      moving_avg_variables = variables.trainable_variables()
      moving_avg_variables += variables.moving_average_variables()
    # Remove duplicates
    moving_avg_variables = set(moving_avg_variables)
    # Collect all the variables with moving average,
    for v in moving_avg_variables:  #生成字典,这里将average_name的作用是获取v对应的影子变量的名字,而且即使v没有影子变量也可以生成
      name_map[self.average_name(v)] = v
    # Make sure we restore variables without moving averages as well.
    moving_avg_variable_names = set([v.name for v in moving_avg_variables])
    for v in list(set(variables.global_variables())):  ## 对于不可训练的变量,如global_step等,也会加进来,key就是原来的名字,不会加后缀,对于针对一些tensor或者op的影子变量可以在这里加载
      if v.name not in moving_avg_variable_names and v.op.name not in name_map:
        name_map[v.op.name] = v
    return name_map

测试代码块

import tensorflow as tf
v1=tf.Variable(0,dtype=tf.float32,name='v1')  
v2=tf.Variable(1,dtype=tf.float32,name='v2')

v3=tf.add(v1,v2)  #操作,不是变量
step=tf.Variable(0,trainable=False,name='step')  #不可训练变量
ema=tf.train.ExponentialMovingAverage(0.99,step)  #创建实例
ema.apply([v1,v3])  #对变量v1,和操作v3创建影子变量
variables_to_restore=ema.variables_to_restore()

print('不可训练变量:variables_to_restore:',ema.variables_to_restore([step])) 
#输入不可训练变量,输出此变量和对应的加后缀的变量名,即使没有用滑动平均;还有就是所有其他的全局变量,包括影子变量,而且key和value一致

print('默认参数:variables_to_restore:',variables_to_restore)
#默认参数,有没有使用滑动平均,都会加后缀。所以如果已存储的数据中某些变量的确不是滑动平均值,就要自己修改
print('未创建了影子变量:variables_to_restore:',ema.variables_to_restore([v2]))
#未创建影子变量,也会加后缀;而且其他变量都会一致输出,即使用了滑动平均,就会输出两个变量。
print('创建了影子变量:variables_to_restore:',ema.variables_to_restore([v1]))
# 针对默认参数提到的问题可以用这个解决,
print('variables:',tf.trainable_variables())  #输出可训练参数
print('variables_average:',ema.average(v2))  #输出对应的影子变量,这里v2没有,而且输入只能是单个输入,不能是列表
print('variables_average_name:',ema.average_name(v2))  #输出影子变量名字,这里没有也会输出
print('moving_average_variables:',tf.moving_average_variables())  #获取的是创建了因子变量的原变量
print('global_variables:',tf.global_variables())    #全局变量
print([(v.op.name,v.name) for v in tf.global_variables()])

输出结果

不可训练变量:variables_to_restore: {'step/ExponentialMovingAverage': <tf.Variable 'step:0' shape=() dtype=int32_ref>, 'v1/ExponentialMovingAverage': <tf.Variable 'v1/ExponentialMovingAverage:0' shape=() dtype=float32_ref>, 'v1': <tf.Variable 'v1:0' shape=() dtype=float32_ref>, 'v2': <tf.Variable 'v2:0' shape=() dtype=float32_ref>, 'Add/ExponentialMovingAverage': <tf.Variable 'Add/ExponentialMovingAverage:0' shape=() dtype=float32_ref>}
默认参数:variables_to_restore: {'v1/ExponentialMovingAverage': <tf.Variable 'v1:0' shape=() dtype=float32_ref>, 'v2/ExponentialMovingAverage': <tf.Variable 'v2:0' shape=() dtype=float32_ref>, 'step': <tf.Variable 'step:0' shape=() dtype=int32_ref>, 'Add/ExponentialMovingAverage': <tf.Variable 'Add/ExponentialMovingAverage:0' shape=() dtype=float32_ref>}
未创建了影子变量:variables_to_restore: {'v2/ExponentialMovingAverage': <tf.Variable 'v2:0' shape=() dtype=float32_ref>, 'step': <tf.Variable 'step:0' shape=() dtype=int32_ref>, 'v1/ExponentialMovingAverage': <tf.Variable 'v1/ExponentialMovingAverage:0' shape=() dtype=float32_ref>, 'v1': <tf.Variable 'v1:0' shape=() dtype=float32_ref>, 'Add/ExponentialMovingAverage': <tf.Variable 'Add/ExponentialMovingAverage:0' shape=() dtype=float32_ref>}
创建了影子变量:variables_to_restore: {'v1/ExponentialMovingAverage': <tf.Variable 'v1:0' shape=() dtype=float32_ref>, 'step': <tf.Variable 'step:0' shape=() dtype=int32_ref>, 'v2': <tf.Variable 'v2:0' shape=() dtype=float32_ref>, 'Add/ExponentialMovingAverage': <tf.Variable 'Add/ExponentialMovingAverage:0' shape=() dtype=float32_ref>}
variables: [<tf.Variable 'v1:0' shape=() dtype=float32_ref>, <tf.Variable 'v2:0' shape=() dtype=float32_ref>]
variables_average: None
variables_average_name: v2/ExponentialMovingAverage
moving_average_variables: [<tf.Variable 'v1:0' shape=() dtype=float32_ref>]
global_variables: [<tf.Variable 'v1:0' shape=() dtype=float32_ref>, <tf.Variable 'v2:0' shape=() dtype=float32_ref>, <tf.Variable 'step:0' shape=() dtype=int32_ref>, <tf.Variable 'v1/ExponentialMovingAverage:0' shape=() dtype=float32_ref>, <tf.Variable 'Add/ExponentialMovingAverage:0' shape=() dtype=float32_ref>]
[('v1', 'v1:0'), ('v2', 'v2:0'), ('step', 'step:0'), ('v1/ExponentialMovingAverage', 'v1/ExponentialMovingAverage:0'), ('Add/ExponentialMovingAverage', 'Add/ExponentialMovingAverage:0')]
import time import tensorflow.compat.v1 as tf tf.disable_v2_behavior() from tensorflow.examples.tutorials.mnist import input_data import mnist_inference import mnist_train tf.compat.v1.reset_default_graph() EVAL_INTERVAL_SECS = 10 def evaluate(mnist): with tf.Graph().as_default() as g: #定义输入与输出的格式 x = tf.compat.v1.placeholder(tf.float32, [None, mnist_inference.INPUT_NODE], name='x-input') y_ = tf.compat.v1.placeholder(tf.float32, [None, mnist_inference.OUTPUT_NODE], name='y-input') validate_feed = {x: mnist.validation.images, y_: mnist.validation.labels} #直接调用封装好的函数来计算前向传播的结果 y = mnist_inference.inference(x, None) #计算正确率 correcgt_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) accuracy = tf.reduce_mean(tf.cast(correcgt_prediction, tf.float32)) #通过变量重命名的方式加载模型 variable_averages = tf.train.ExponentialMovingAverage(0.99) variable_to_restore = variable_averages.variables_to_restore() saver = tf.train.Saver(variable_to_restore) #每隔10秒调用一次计算正确率的过程以检测训练过程中正确率的变化 while True: with tf.compat.v1.Session() as sess: ckpt = tf.train.get_checkpoint_state(minist_train.MODEL_SAVE_PATH) if ckpt and ckpt.model_checkpoint_path: #load the model saver.restore(sess, ckpt.model_checkpoint_path) global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1] accuracy_score = sess.run(accuracy, feed_dict=validate_feed) print("After %s training steps, validation accuracy = %g" % (global_step, accuracy_score)) else: print('No checkpoint file found') return time.sleep(EVAL_INTERVAL_SECS) def main(argv=None): mnist = input_data.read_data_sets(r"D:\Anaconda123\Lib\site-packages\tensorboard\mnist", one_hot=True) evaluate(mnist) if __name__ == '__main__': tf.compat.v1.app.run()对代码进行改进
最新发布
05-26
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值