tf.train.Saver()和 tf.train.ExponentialMovingAverage()是tensorflow的两个类,第一个是用来保存模型和参数的,第二个是使用的滑动平均模型,官方的文档参考这里:滑动平均模型和
模型保存,相关的讲解网上也有很多,我就结合自己的使用讲一下自己的理解和遇到的问题
tf.train.Saver()
v1 = tf.Variable(..., name='v1')
v2 = tf.Variable(..., name='v2')
# Pass the variables as a dict:
saver = tf.compat.v1.train.Saver({'v1': v1, 'v2': v2})
# Or pass them as a list.
saver = tf.compat.v1.train.Saver([v1, v2])
# Passing a list is equivalent to passing a dict with the variable op names
# as keys:
saver = tf.compat.v1.train.Saver({v.op.name: v for v in [v1, v2]})
如上图所示,tf.train.Saver()是一个类,首先要创建一个对象,这里他的__init__方法最关键的是var_list参数,接受的是需要存储或载入的变量,要求是字典或列表,如上图所示,字典的或,字典的value是现在构建好的网络中的实际的变量,而key是保存这个变量所取得名字,对于restore就是根据key索引checkpoint中的值,然后restore到对应的变量中。如果直接输入列表,tf就会自动用变量的名字来作为key
tf.train.ExponentialMovingAverage()
# Create variables.
var0 = tf.Variable(...)
var1 = tf.Variable(...)
# ... use the variables to build a training model...
...
# Create an op that applies the optimizer. This is what we usually
# would use as a training op.
opt_op = opt.minimize(my_loss, [var0, var1])
# Create an ExponentialMovingAverage object
ema = tf.train.ExponentialMovingAverage(decay=0.9999)
with tf.control_dependencies([opt_op]):
# Create the shadow variables, and add ops to maintain moving averages
# of var0 and var1. This also creates an op that will update the moving
# averages after each training step. This is what we will use in place
# of the usual training op.
training_op = ema.apply([var0, var1])
...train the model by running training_op...
这个类的作用就是针对指定变量,通过公式shadow_variable -= (1 - decay) * (shadow_variable - variable),来维护了一组影子变量,这个影子变量也不会使用在实际的反向传播中。
操作方式是:
- 先创建这个类的实例对象,初始化时要指定衰减率decay(必须),还可以初始化num_updates,这个可以默认,也可以自己初始化,一般训练时可以用global-step来初始化,将min(decay, (1 + num_updates) / (10 + num_updates))的值作为实际的decay。
- 使用apply方法,指定需要滑动平均的变量,并对其进行操作,这个操作一般在每次梯度下降更新完变量之后再操作,所以使用control_dependencies,要自己控制一下操作顺序。
- 他还有average方法,输入变量var,用来返回var对应的影子变量,返回的影子变量名一般是在var后加\ExponentialMovingAverage,也就是该类维护的影子变量
- 他的variables_to_restore方法,参数是moving_avg_variables=None,这个方法返回的是一个字典,这个字典一般用作tf.train.Saver().restore()方法的输入参数,来导入之前存储的参数,所以一般的要求是将之前存储的key为影子变量名字的值加载到原变量中,所以格式一般是 ‘影子变量名:原变量’。moving_avg_variables的值为None时,就默认输入trainable_variables和moving_average_variables,输出是这些变量和对应的加后缀的变量名。还有如果有不可训练变量或者对一些操作或者tensor维护的影子变量,也会加载进来。具体可以看下面我注释的源码,还有我的自己的测试代码。
def variables_to_restore(self, moving_avg_variables=None):
"""Returns a map of names to `Variables` to restore.
If a variable has a moving average, use the moving average variable name as
the restore name; otherwise, use the variable name.
For example,
```python
variables_to_restore = ema.variables_to_restore()
saver = tf.compat.v1.train.Saver(variables_to_restore)
```
Below is an example of such mapping:
```
conv/batchnorm/gamma/ExponentialMovingAverage: conv/batchnorm/gamma,
conv_4/conv2d_params/ExponentialMovingAverage: conv_4/conv2d_params,
global_step: global_step
```
Args:
moving_avg_variables: a list of variables that require to use of the
moving average variable name to be restored. If None, it will default to
variables.moving_average_variables() + variables.trainable_variables()
Returns:
A map from restore_names to variables. The restore_name is either the
original or the moving average version of the variable name, depending
on whether the variable name is in the `moving_avg_variables`.
"""
name_map = {} ##存放最后输出内容的字典
if moving_avg_variables is None: # 如果没有给输入参数,就默认输入参数是可训练的参数和显示定义了moving_average的变量的原变量
# Include trainable variables and variables which have been explicitly
# added to the moving_average_variables collection.
moving_avg_variables = variables.trainable_variables()
moving_avg_variables += variables.moving_average_variables()
# Remove duplicates
moving_avg_variables = set(moving_avg_variables)
# Collect all the variables with moving average,
for v in moving_avg_variables: #生成字典,这里将average_name的作用是获取v对应的影子变量的名字,而且即使v没有影子变量也可以生成
name_map[self.average_name(v)] = v
# Make sure we restore variables without moving averages as well.
moving_avg_variable_names = set([v.name for v in moving_avg_variables])
for v in list(set(variables.global_variables())): ## 对于不可训练的变量,如global_step等,也会加进来,key就是原来的名字,不会加后缀,对于针对一些tensor或者op的影子变量可以在这里加载
if v.name not in moving_avg_variable_names and v.op.name not in name_map:
name_map[v.op.name] = v
return name_map
测试代码块
import tensorflow as tf
v1=tf.Variable(0,dtype=tf.float32,name='v1')
v2=tf.Variable(1,dtype=tf.float32,name='v2')
v3=tf.add(v1,v2) #操作,不是变量
step=tf.Variable(0,trainable=False,name='step') #不可训练变量
ema=tf.train.ExponentialMovingAverage(0.99,step) #创建实例
ema.apply([v1,v3]) #对变量v1,和操作v3创建影子变量
variables_to_restore=ema.variables_to_restore()
print('不可训练变量:variables_to_restore:',ema.variables_to_restore([step]))
#输入不可训练变量,输出此变量和对应的加后缀的变量名,即使没有用滑动平均;还有就是所有其他的全局变量,包括影子变量,而且key和value一致
print('默认参数:variables_to_restore:',variables_to_restore)
#默认参数,有没有使用滑动平均,都会加后缀。所以如果已存储的数据中某些变量的确不是滑动平均值,就要自己修改
print('未创建了影子变量:variables_to_restore:',ema.variables_to_restore([v2]))
#未创建影子变量,也会加后缀;而且其他变量都会一致输出,即使用了滑动平均,就会输出两个变量。
print('创建了影子变量:variables_to_restore:',ema.variables_to_restore([v1]))
# 针对默认参数提到的问题可以用这个解决,
print('variables:',tf.trainable_variables()) #输出可训练参数
print('variables_average:',ema.average(v2)) #输出对应的影子变量,这里v2没有,而且输入只能是单个输入,不能是列表
print('variables_average_name:',ema.average_name(v2)) #输出影子变量名字,这里没有也会输出
print('moving_average_variables:',tf.moving_average_variables()) #获取的是创建了因子变量的原变量
print('global_variables:',tf.global_variables()) #全局变量
print([(v.op.name,v.name) for v in tf.global_variables()])
输出结果
不可训练变量:variables_to_restore: {'step/ExponentialMovingAverage': <tf.Variable 'step:0' shape=() dtype=int32_ref>, 'v1/ExponentialMovingAverage': <tf.Variable 'v1/ExponentialMovingAverage:0' shape=() dtype=float32_ref>, 'v1': <tf.Variable 'v1:0' shape=() dtype=float32_ref>, 'v2': <tf.Variable 'v2:0' shape=() dtype=float32_ref>, 'Add/ExponentialMovingAverage': <tf.Variable 'Add/ExponentialMovingAverage:0' shape=() dtype=float32_ref>}
默认参数:variables_to_restore: {'v1/ExponentialMovingAverage': <tf.Variable 'v1:0' shape=() dtype=float32_ref>, 'v2/ExponentialMovingAverage': <tf.Variable 'v2:0' shape=() dtype=float32_ref>, 'step': <tf.Variable 'step:0' shape=() dtype=int32_ref>, 'Add/ExponentialMovingAverage': <tf.Variable 'Add/ExponentialMovingAverage:0' shape=() dtype=float32_ref>}
未创建了影子变量:variables_to_restore: {'v2/ExponentialMovingAverage': <tf.Variable 'v2:0' shape=() dtype=float32_ref>, 'step': <tf.Variable 'step:0' shape=() dtype=int32_ref>, 'v1/ExponentialMovingAverage': <tf.Variable 'v1/ExponentialMovingAverage:0' shape=() dtype=float32_ref>, 'v1': <tf.Variable 'v1:0' shape=() dtype=float32_ref>, 'Add/ExponentialMovingAverage': <tf.Variable 'Add/ExponentialMovingAverage:0' shape=() dtype=float32_ref>}
创建了影子变量:variables_to_restore: {'v1/ExponentialMovingAverage': <tf.Variable 'v1:0' shape=() dtype=float32_ref>, 'step': <tf.Variable 'step:0' shape=() dtype=int32_ref>, 'v2': <tf.Variable 'v2:0' shape=() dtype=float32_ref>, 'Add/ExponentialMovingAverage': <tf.Variable 'Add/ExponentialMovingAverage:0' shape=() dtype=float32_ref>}
variables: [<tf.Variable 'v1:0' shape=() dtype=float32_ref>, <tf.Variable 'v2:0' shape=() dtype=float32_ref>]
variables_average: None
variables_average_name: v2/ExponentialMovingAverage
moving_average_variables: [<tf.Variable 'v1:0' shape=() dtype=float32_ref>]
global_variables: [<tf.Variable 'v1:0' shape=() dtype=float32_ref>, <tf.Variable 'v2:0' shape=() dtype=float32_ref>, <tf.Variable 'step:0' shape=() dtype=int32_ref>, <tf.Variable 'v1/ExponentialMovingAverage:0' shape=() dtype=float32_ref>, <tf.Variable 'Add/ExponentialMovingAverage:0' shape=() dtype=float32_ref>]
[('v1', 'v1:0'), ('v2', 'v2:0'), ('step', 'step:0'), ('v1/ExponentialMovingAverage', 'v1/ExponentialMovingAverage:0'), ('Add/ExponentialMovingAverage', 'Add/ExponentialMovingAverage:0')]