tensorflow学习笔记(三十三):ExponentialMovingAverage

最新推荐文章于 2024-09-13 16:25:15 发布

u012436149

最新推荐文章于 2024-09-13 16:25:15 发布

阅读量2.3w

点赞数 10

分类专栏： tensorflow tensorflow学习笔记

本文链接：https://blog.csdn.net/u012436149/article/details/56484572

版权

tensorflow 同时被 2 个专栏收录

69 篇文章 35 订阅

订阅专栏

tensorflow学习笔记

54 篇文章 679 订阅

订阅专栏

ExponentialMovingAverage

Some training algorithms, such as GradientDescent and Momentum often benefit from maintaining a moving average of variables during optimization. Using the moving averages for evaluations often improve results significantly.
tensorflow 官网上对于这个方法功能的介绍。GradientDescent 和 Momentum 方式的训练都能够从 ExponentialMovingAverage 方法中获益。

什么是MovingAverage?
假设我们与一串时间序列

{a 1, a 2, a 3, . . ., a t - 1, a t, . . .}

$\{a_1, a_2, a_3, ..., a_{t-1}, a_t, ...\}$ ,那么，这串时间序列的 MovingAverage 就是：

m v t = d e c a y * m v t - 1 + (1 - d e c a y) * a t

$mv_t = decay*mv_{t-1}+(1-decay)* a_t$
这是一个递归表达式。
如何理解这个式子呢？
他就像一个滑动窗口，

mvt $mv_t$ 的值只和这个窗口内的

ai $a_i$ 有关，为什么这么说呢？将递归式拆开 :

m v t m v t - 1 m v t - 2 = (1 - d e c a y) * a t + d e c a y * m v t - 1 = (1 - d e c a y) * a t - 1 + d e c a y * m v t - 2 = (1 - d e c a y) * a t - 2 + d e c a y * m v t - 3 . . .

$\begin{aligned} mv_t &= (1-decay)* a_t+decay*mv_{t-1} \\ mv_{t-1} &= (1-decay)* a_{t-1}+decay*mv_{t-2} \\ mv_{t-2} &= (1-decay)* a_{t-2}+decay*mv_{t-3}\\ &... \end{aligned}$
得到：

m v t = \sum i = 1 t d e c a y t - i * (1 - d e c a y) * a i

$mv_t = \sum_{i=1}^tdecay^{t-i}* (1-decay)* a_i$
当

t−i>C $t-i>C$ ，

C $C$ 为某足够大的数时

d e c a y t - i * (1 - d e c a y) * a i \approx 0

$decay^{t-i}* (1-decay)* a_i \approx 0$
, 所以:

m v t \approx \sum i = t - C t d e c a y t - i * (1 - d e c a y) * a i

$mv_t\approx \sum_{i=t-C}^tdecay^{t-i}* (1-decay)* a_i$ 。即，

mvt $mv_t$ 的值只和

{at−C,...,at} $\{a_{t-C},...,a_t\}$ 有关。

tensorflow 中的 ExponentialMovingAverage

这时，再看官方文档中的公式:

shadowVariable = d e c a y * shadowVariable + (1 - d e c a y) * v a r i a b l e

$\text{shadowVariable} = decay * \text{shadowVariable} + (1 - decay) * variable$ ,就知道各代表什么意思了。
shadow variables are created with trainable=False。用其来存放 ema 的值

import tensorflow as tf
w = tf.Variable(1.0)
ema = tf.train.ExponentialMovingAverage(0.9)
update = tf.assign_add(w, 1.0)

with tf.control_dependencies([update]):
    #返回一个op,这个op用来更新moving_average,i.e. shadow value
    ema_op = ema.apply([w])#这句和下面那句不能调换顺序
# 以 w 当作 key， 获取 shadow value 的值
ema_val = ema.average(w)#参数不能是list，有点蛋疼

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    for i in range(3):
        sess.run(ema_op)
        print(sess.run(ema_val))
# 创建一个时间序列 1 2 3 4
#输出：
#1.1      =0.9*1 + 0.1*2
#1.29     =0.9*1.1+0.1*3
#1.561    =0.9*1.29+0.1*4

你可能会奇怪，明明只执行三次循环，为什么产生了 4 个数？
这是因为，当程序执行到 ema_op = ema.apply([w]) 的时候，如果 w 是 Variable，那么将会用 w 的初始值初始化 ema 中关于 w 的 ema_value，所以 $emaVal_0=1.0$ 。如果 w 是 Tensor的话，将会用 0.0 初始化。

官网中的示例：

# Create variables.
var0 = tf.Variable(...)
var1 = tf.Variable(...)
# ... use the variables to build a training model...
...
# Create an op that applies the optimizer.  This is what we usually
# would use as a training op.
opt_op = opt.minimize(my_loss, [var0, var1])

# Create an ExponentialMovingAverage object
ema = tf.train.ExponentialMovingAverage(decay=0.9999)

# Create the shadow variables, and add ops to maintain moving averages
# of var0 and var1.
maintain_averages_op = ema.apply([var0, var1])

# Create an op that will update the moving averages after each training
# step.  This is what we will use in place of the usual training op.
with tf.control_dependencies([opt_op]):
    training_op = tf.group(maintain_averages_op)
    # run这个op获取当前时刻 ema_value
    get_var0_average_op = ema.average(var0)

使用 ExponentialMovingAveraged parameters

假设我们使用了ExponentialMovingAverage方法训练了神经网络，在test阶段，如何使用 ExponentialMovingAveraged parameters呢？官网也给出了答案
方法一：

# Create a Saver that loads variables from their saved shadow values.
shadow_var0_name = ema.average_name(var0)
shadow_var1_name = ema.average_name(var1)
saver = tf.train.Saver({shadow_var0_name: var0, shadow_var1_name: var1})
saver.restore(...checkpoint filename...)
# var0 and var1 now hold the moving average values

方法二：

#Returns a map of names to Variables to restore.
variables_to_restore = ema.variables_to_restore()
saver = tf.train.Saver(variables_to_restore)
...
saver.restore(...checkpoint filename...)

这里要注意的一个问题是，用于保存的saver可不能这么写，参考 http://blog.csdn.net/u012436149/article/details/56665612