tensorflow 滑动平均模型 ExponentialMovingAverage

最新推荐文章于 2024-06-03 18:37:37 发布

tz_zs

最新推荐文章于 2024-06-03 18:37:37 发布

阅读量8.6k

点赞数 3

分类专栏： # TensorFlow 文章标签： tensorflow MovingAverage 滑动平均模型 ema

本文链接：https://blog.csdn.net/tz_zs/article/details/75581315

版权

TensorFlow 专栏收录该内容

30 篇文章 1 订阅

订阅专栏

____tz_zs学习笔记

滑动平均模型对于采用GradientDescent或Momentum训练的神经网络的表现都有一定程度上的提升。

原理：在训练神经网络时，不断保持和更新每个参数的滑动平均值，在验证和测试时，参数的值使用其滑动平均值，能有效提高神经网络的准确率。

tf.train.ExponentialMovingAverage

tensorflow官网地址：https://www.tensorflow.org/versions/r0.12/api_docs/python/train/moving_averages

tensorflow中提供了tf.train.ExponentialMovingAverage来实现滑动平均模型，他使用指数衰减来计算变量的移动平均值。

tf.train.ExponentialMovingAverage.__init__(self, decay, num_updates=None, zero_debias=False, name="ExponentialMovingAverage"):

decay是衰减率

num_updates是ExponentialMovingAverage提供用来动态设置decay的参数，当初始化时提供了参数，即不为none时，每次的衰减率是：

min { decay , ( 1 + num_updates ) / ( 10 + num_updates ) }

apply()方法添加了训练变量的影子副本，并保持了其影子副本中训练变量的移动平均值操作。在每次训练之后调用此操作，更新移动平均值。

average()和average_name()方法可以获取影子变量及其名称。

在创建ExponentialMovingAverage对象时，需指定衰减率（decay），用于控制模型的更新速度。影子变量的初始值与训练变量的初始值相同。当运行变量更新时，每个影子变量都会更新为：

shadow_variable = decay * shadow_variable + (1 - decay) * variable

decay设置为接近1的值比较合理，通常为：0.999,0.9999等

滑动平均的原理理解

# -*- coding: utf-8 -*-
"""
@author: tz_zs

滑动平均模型
"""
import tensorflow as tf

# 定义一个变量，用于滑动平均计算、
v1 = tf.Variable(0, dtype=tf.float32)
# 定义一个变量step,表示迭代的轮数，用于动态控制衰减率
step = tf.Variable(0, trainable=False)

# 定义滑动平均的对象
ema = tf.train.ExponentialMovingAverage(0.99, step)

# 定义执行保持滑动平均的操作,  参数为一个列表格式
maintain_average_op = ema.apply([v1])

with tf.Session() as sess:
    #  初始化所有变量
    init_op = tf.global_variables_initializer()
    sess.run(init_op)

    # 通过ema.average(v1)获取滑动平均之后变量的取值，
    # print(sess.run(v1))  # 0.0
    # print(sess.run([ema.average_name(v1), ema.average(v1)]))  # [None, 0.0]
    print(sess.run([v1, ema.average(v1)]))  # [0.0, 0.0]

    # 更新变量v1的值为5
    sess.run(tf.assign(v1, 5))
    # 更新v1的滑动平均值，衰减率 min { decay , ( 1 + num_updates ) / ( 10 + num_updates ) }=0.1
    # 所以v1的滑动平均会被更新为 0.1*0 + 0.9*5 = 4.5
    sess.run(maintain_average_op)
    # print(sess.run(v1))  # 5.0
    # print(sess.run([ema.average_name(v1), ema.average(v1)]))  # [None, 4.5]
    print(sess.run([v1, ema.average(v1)]))  # [5.0, 4.5]

    # 更新step的值为10000。模拟迭代轮数
    sess.run(tf.assign(step, 10000))
    # 跟新v1的值为10
    sess.run(tf.assign(v1, 10))
    # 更新v1的滑动平均值。衰减率为 min { decay , ( 1 + num_updates ) / ( 10 + num_updates ) }得到 0.99
    # 所以v1的滑动平均值会被更新为 0.99*4.5 + 0.01*10 = 4.555
    sess.run(maintain_average_op)
    print(sess.run([v1, ema.average(v1)]))  # [10.0, 4.5549998]

    # 再次更新滑动平均值，将得到 0.99*4.555 + 0.01*10 =4.60945
    sess.run(maintain_average_op)
    print(sess.run([v1, ema.average(v1)]))  # [10.0, 4.6094499]

# -*- coding: utf-8 -*-
"""
@author: tz_zs

"""
import tensorflow as tf

v1 = tf.Variable(10, dtype=tf.float32, name="v")

for variables in tf.global_variables():  # all_variables弃用了
    print(variables)  # <tf.Variable 'v:0' shape=() dtype=float32_ref>

ema = tf.train.ExponentialMovingAverage(0.99)
print(ema)  # <tensorflow.python.training.moving_averages.ExponentialMovingAverage object at 0x00000218AE5720F0>

maintain_averages_op = ema.apply(tf.global_variables())
for variables in tf.global_variables():
    print(variables)  # <tf.Variable 'v:0' shape=() dtype=float32_ref>
    # <tf.Variable 'v/ExponentialMovingAverage:0' shape=() dtype=float32_ref>

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    sess.run(tf.assign(v1, 1))
    sess.run(maintain_averages_op)
    print(sess.run([v1, ema.average(v1)]))  # [1.0, 9.9099998]

滑动平均值的存储和加载（持久化）

# -*- coding: utf-8 -*-
"""
@author: tz_zs

滑动平均值的存储和加载（持久化）
"""
import tensorflow as tf

v1 = tf.Variable(10, dtype=tf.float32, name="v1")

for variables in tf.global_variables():  # all_variables弃用了
    print(variables)  # <tf.Variable 'v1:0' shape=() dtype=float32_ref>

ema = tf.train.ExponentialMovingAverage(0.99)
print(ema)  # <tensorflow.python.training.moving_averages.ExponentialMovingAverage object at 0x00000218AE5720F0>

maintain_averages_op = ema.apply(tf.global_variables())
for variables in tf.global_variables():
    print(variables)  # <tf.Variable 'v1:0' shape=() dtype=float32_ref>
    # <tf.Variable 'v1/ExponentialMovingAverage:0' shape=() dtype=float32_ref>

saver = tf.train.Saver()
print(saver)  # <tensorflow.python.training.saver.Saver object at 0x0000026B7E591940>
with tf.Session() as sess:
    tf.global_variables_initializer().run()
    sess.run(tf.assign(v1, 1))
    sess.run(maintain_averages_op)
    print(sess.run([v1, ema.average(v1)]))  # [1.0, 9.9099998]

    print(saver.save(sess, "/path/to/model.ckpt"))  # 持久化存储____会返回路径 /path/to/model.ckpt
#################################################################################################
print("#####" * 10)
print("加载")
#################################################################################################
var2 = tf.Variable(0, dtype=tf.float32, name="v2")  # <tf.Variable 'v2:0' shape=() dtype=float32_ref>
print(var2)
saver2 = tf.train.Saver({"v1/ExponentialMovingAverage": var2})
with tf.Session() as sess2:
    saver2.restore(sess2, "/path/to/model.ckpt")
    print(sess2.run(var2))  # 9.91 所以，成功加载了v1的滑动平均值

也可以使用tensorflow提供的variable_to_restore函数完成加载

var3 = tf.Variable(0, dtype=tf.float32, name="v1")
print(var3)  # <tf.Variable 'v1:0' shape=() dtype=float32_ref>
ema = tf.train.ExponentialMovingAverage(0.99)

print(ema.variables_to_restore())  # {'v1/ExponentialMovingAverage': <tf.Variable 'v1:0' shape=() dtype=float32_ref>}
saver = tf.train.Saver(ema.variables_to_restore())
with tf.Session() as sess:
    saver.restore(sess, "/path/to/model.ckpt")
    print(sess.run(var3))  # 9.91

附录1：tensorflow1.2版本moving_averages.py源代码

# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Maintain moving averages of parameters."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensorflow.python.framework import dtypes
from tensorflow.python.framework import ops
from tensorflow.python.ops import control_flow_ops
from tensorflow.python.ops import init_ops
from tensorflow.python.ops import math_ops
from tensorflow.python.ops import state_ops
from tensorflow.python.ops import variable_scope
from tensorflow.python.ops import variables
from tensorflow.python.training import slot_creator


# TODO(touts): switch to variables.Variable.
def assign_moving_average(variable, value, decay, zero_debias=True, name=None):
  """Compute the moving average of a variable.

  The moving average of 'variable' updated with 'value' is:
    variable * decay + value * (1 - decay)

  The returned Operation sets 'variable' to the newly computed moving average.

  The new value of 'variable' can be set with the 'AssignSub' op as:
     variable -= (1 - decay) * (variable - value)

  Since variables that are initialized to a `0` value will be `0` biased,
  `zero_debias` optionally enables scaling by the mathematically correct
  debiasing factor of
    1 - decay ** num_updates
  See `ADAM: A Method for Stochastic Optimization` Section 3 for more details
  (https://arxiv.org/abs/1412.6980).

  Args:
    variable: A Variable.
    value: A tensor with the same shape as 'variable'.
    decay: A float Tensor or float value.  The moving average decay.
    zero_debias: A python bool. If true, assume the variable is 0-intialized and
      unbias it, as in https://arxiv.org/abs/1412.6980. See docstring in
      `_zero_debias` for more details.
    name: Optional name of the returned operation.

  Returns:
    A reference to the input 'variable' tensor with the newly computed
    moving average.
  """
  with ops.name_scope(name, "AssignMovingAvg",
                      [variable, value, decay]) as scope:
    with ops.colocate_with(variable):
      decay = ops.convert_to_tensor(1.0 - decay, name="decay")
      if decay.dtype != variable.dtype.base_dtype:
        decay = math_ops.cast(decay, variable.dtype.base_dtype)
      if zero_debias:
        update_delta = _zero_debias(variable, value, decay)
      else:
        update_delta = (variable - value) * decay
      return state_ops.assign_sub(variable, update_delta, name=scope)


def weighted_moving_average(value,
                            decay,
                            weight,
                            truediv=True,
                            collections=None,
                            name=None):
  """Compute the weighted moving average of `value`.

  Conceptually, the weighted moving average is:
    `moving_average(value * weight) / moving_average(weight)`,
  where a moving average updates by the rule
    `new_value = decay * old_value + (1 - decay) * update`
  Internally, this Op keeps moving average variables of both `value * weight`
  and `weight`.

  Args:
    value: A numeric `Tensor`.
    decay: A float `Tensor` or float value.  The moving average decay.
    weight:  `Tensor` that keeps the current value of a weight.
      Shape should be able to multiply `value`.
    truediv:  Boolean, if `True`, dividing by `moving_average(weight)` is
      floating point division.  If `False`, use division implied by dtypes.
    collections:  List of graph collections keys to add the internal variables
      `value * weight` and `weight` to.
      Defaults to `[GraphKeys.GLOBAL_VARIABLES]`.
    name: Optional name of the returned operation.
      Defaults to "WeightedMovingAvg".

  Returns:
    An Operation that updates and returns the weighted moving average.
  """
  # Unlike assign_moving_average, the weighted moving average doesn't modify
  # user-visible variables. It is the ratio of two internal variables, which are
  # moving averages of the updates.  Thus, the signature of this function is
  # quite different than assign_moving_average.
  if collections is None:
    collections = [ops.GraphKeys.GLOBAL_VARIABLES]
  with variable_scope.variable_scope(name, "WeightedMovingAvg",
                                     [value, weight, decay]) as scope:
    value_x_weight_var = variable_scope.get_variable(
        "value_x_weight",
        shape=value.get_shape(),
        dtype=value.dtype,
        initializer=init_ops.zeros_initializer(),
        trainable=False,
        collections=collections)
    weight_var = variable_scope.get_variable(
        "weight",
        shape=weight.get_shape(),
        dtype=weight.dtype,
        initializer=init_ops.zeros_initializer(),
        trainable=False,
        collections=collections)
    numerator = assign_moving_average(
        value_x_weight_var, value * weight, decay, zero_debias=False)
    denominator = assign_moving_average(
        weight_var, weight, decay, zero_debias=False)

    if truediv:
      return math_ops.truediv(numerator, denominator, name=scope.name)
    else:
      return math_ops.div(numerator, denominator, name=scope.name)


def _zero_debias(unbiased_var, value, decay):
  """Compute the delta required for a debiased Variable.

  All exponential moving averages initialized with Tensors are initialized to 0,
  and therefore are biased to 0. Variables initialized to 0 and used as EMAs are
  similarly biased. This function creates the debias updated amount according to
  a scale factor, as in https://arxiv.org/abs/1412.6980.

  To demonstrate the bias the results from 0-initialization, take an EMA that
  was initialized to `0` with decay `b`. After `t` timesteps of seeing the
  constant `c`, the variable have the following value:

  ```
    EMA = 0*b^(t) + c*(1 - b)*b^(t-1) + c*(1 - b)*b^(t-2) + ...
        = c*(1 - b^t)
  ```

  To have the true value `c`, we would divide by the scale factor `1 - b^t`.

  In order to perform debiasing, we use two shadow variables. One keeps track of
  the biased estimate, and the other keeps track of the number of updates that
  have occurred.

  Args:
    unbiased_var: A Variable representing the current value of the unbiased EMA.
    value: A Tensor representing the most recent value.
    decay: A Tensor representing `1-decay` for the EMA.

  Returns:
    The amount that the unbiased variable should be updated. Computing this
    tensor will also update the shadow variables appropriately.
  """
  with variable_scope.variable_scope(
      unbiased_var.op.name, values=[unbiased_var, value, decay]) as scope:
    with ops.colocate_with(unbiased_var):
      with ops.control_dependencies(None):
        biased_initializer = init_ops.zeros_initializer(
            dtype=unbiased_var.dtype)(unbiased_var.get_shape())
        local_step_initializer = init_ops.zeros_initializer()
      biased_var = variable_scope.get_variable(
          "biased", initializer=biased_initializer, trainable=False)
      local_step = variable_scope.get_variable(
          "local_step",
          shape=[],
          dtype=unbiased_var.dtype,
          initializer=local_step_initializer,
          trainable=False)

      # Get an update ops for both shadow variables.
      update_biased = state_ops.assign_sub(biased_var,
                                           (biased_var - value) * decay,
                                           name=scope.name)
      update_local_step = local_step.assign_add(1)

      # Compute the value of the delta to update the unbiased EMA. Make sure to
      # use the new values of the biased variable and the local step.
      with ops.control_dependencies([update_biased, update_local_step]):
        # This function gets `1 - decay`, so use `1.0 - decay` in the exponent.
        unbiased_ema_delta = (unbiased_var - biased_var.read_value() /
                              (1 - math_ops.pow(
                                  1.0 - decay, local_step.read_value())))

      return unbiased_ema_delta


class ExponentialMovingAverage(object):
  """Maintains moving averages of variables by employing an exponential decay.

  When training a model, it is often beneficial to maintain moving averages of
  the trained parameters.  Evaluations that use averaged parameters sometimes
  produce significantly better results than the final trained values.

  The `apply()` method adds shadow copies of trained variables and add ops that
  maintain a moving average of the trained variables in their shadow copies.
  It is used when building the training model.  The ops that maintain moving
  averages are typically run after each training step.
  The `average()` and `average_name()` methods give access to the shadow
  variables and their names.  They are useful when building an evaluation
  model, or when restoring a model from a checkpoint file.  They help use the
  moving averages in place of the last trained values for evaluations.

  The moving averages are computed using exponential decay.  You specify the
  decay value when creating the `ExponentialMovingAverage` object.  The shadow
  variables are initialized with the same initial values as the trained
  variables.  When you run the ops to maintain the moving averages, each
  shadow variable is updated with the formula:

    `shadow_variable -= (1 - decay) * (shadow_variable - variable)`

  This is mathematically equivalent to the classic formula below, but the use
  of an `assign_sub` op (the `"-="` in the formula) allows concurrent lockless
  updates to the variables:

    `shadow_variable = decay * shadow_variable + (1 - decay) * variable`

  Reasonable values for `decay` are close to 1.0, typically in the
  multiple-nines range: 0.999, 0.9999, etc.

  Example usage when creating a training model:

  ```python
  # Create variables.
  var0 = tf.Variable(...)
  var1 = tf.Variable(...)
  # ... use the variables to build a training model...
  ...
  # Create an op that applies the optimizer.  This is what we usually
  # would use as a training op.
  opt_op = opt.minimize(my_loss, [var0, var1])

  # Create an ExponentialMovingAverage object
  ema = tf.train.ExponentialMovingAverage(decay=0.9999)

  # Create the shadow variables, and add ops to maintain moving averages
  # of var0 and var1.
  maintain_averages_op = ema.apply([var0, var1])

  # Create an op that will update the moving averages after each training
  # step.  This is what we will use in place of the usual training op.
  with tf.control_dependencies([opt_op]):
      training_op = tf.group(maintain_averages_op)

  ...train the model by running training_op...
  ```

  There are two ways to use the moving averages for evaluations:

  *  Build a model that uses the shadow variables instead of the variables.
     For this, use the `average()` method which returns the shadow variable
     for a given variable.
  *  Build a model normally but load the checkpoint files to evaluate by using
     the shadow variable names.  For this use the `average_name()` method.  See
     the @{tf.train.Saver} for more
     information on restoring saved variables.

  Example of restoring the shadow variable values:

  ```python
  # Create a Saver that loads variables from their saved shadow values.
  shadow_var0_name = ema.average_name(var0)
  shadow_var1_name = ema.average_name(var1)
  saver = tf.train.Saver({shadow_var0_name: var0, shadow_var1_name: var1})
  saver.restore(...checkpoint filename...)
  # var0 and var1 now hold the moving average values
  ```
  """

  def __init__(self, decay, num_updates=None, zero_debias=False,
               name="ExponentialMovingAverage"):
    """Creates a new ExponentialMovingAverage object.

    The `apply()` method has to be called to create shadow variables and add
    ops to maintain moving averages.

    The optional `num_updates` parameter allows one to tweak the decay rate
    dynamically. It is typical to pass the count of training steps, usually
    kept in a variable that is incremented at each step, in which case the
    decay rate is lower at the start of training.  This makes moving averages
    move faster.  If passed, the actual decay rate used is:

      `min(decay, (1 + num_updates) / (10 + num_updates))`

    Args:
      decay: Float.  The decay to use.
      num_updates: Optional count of number of updates applied to variables.
      zero_debias: If `True`, zero debias moving-averages that are initialized
        with tensors.
      name: String. Optional prefix name to use for the name of ops added in
        `apply()`.
    """
    self._decay = decay
    self._num_updates = num_updates
    self._zero_debias = zero_debias
    self._name = name
    self._averages = {}

  def apply(self, var_list=None):
    """Maintains moving averages of variables.

    `var_list` must be a list of `Variable` or `Tensor` objects.  This method
    creates shadow variables for all elements of `var_list`.  Shadow variables
    for `Variable` objects are initialized to the variable's initial value.
    They will be added to the `GraphKeys.MOVING_AVERAGE_VARIABLES` collection.
    For `Tensor` objects, the shadow variables are initialized to 0 and zero
    debiased (see docstring in `assign_moving_average` for more details).

    shadow variables are created with `trainable=False` and added to the
    `GraphKeys.ALL_VARIABLES` collection.  They will be returned by calls to
    `tf.global_variables()`.

    Returns an op that updates all shadow variables as described above.

    Note that `apply()` can be called multiple times with different lists of
    variables.

    Args:
      var_list: A list of Variable or Tensor objects. The variables
        and Tensors must be of types float16, float32, or float64.

    Returns:
      An Operation that updates the moving averages.

    Raises:
      TypeError: If the arguments are not all float16, float32, or float64.
      ValueError: If the moving average of one of the variables is already
        being computed.
    """
    # TODO(touts): op_scope
    if var_list is None:
      var_list = variables.trainable_variables()
    zero_debias_true = set()  # set of vars to set `zero_debias=True`
    for var in var_list:
      if var.dtype.base_dtype not in [dtypes.float16, dtypes.float32,
                                      dtypes.float64]:
        raise TypeError("The variables must be half, float, or double: %s" %
                        var.name)
      if var in self._averages:
        raise ValueError("Moving average already computed for: %s" % var.name)

      # For variables: to lower communication bandwidth across devices we keep
      # the moving averages on the same device as the variables. For other
      # tensors, we rely on the existing device allocation mechanism.
      with ops.control_dependencies(None):
        if isinstance(var, variables.Variable):
          avg = slot_creator.create_slot(var,
                                         var.initialized_value(),
                                         self._name,
                                         colocate_with_primary=True)
          # NOTE(mrry): We only add `tf.Variable` objects to the
          # `MOVING_AVERAGE_VARIABLES` collection.
          ops.add_to_collection(ops.GraphKeys.MOVING_AVERAGE_VARIABLES, var)
        else:
          avg = slot_creator.create_zeros_slot(
              var,
              self._name,
              colocate_with_primary=(var.op.type in ["Variable", "VariableV2"]))
          if self._zero_debias:
            zero_debias_true.add(avg)
      self._averages[var] = avg

    with ops.name_scope(self._name) as scope:
      decay = ops.convert_to_tensor(self._decay, name="decay")
      if self._num_updates is not None:
        num_updates = math_ops.cast(self._num_updates,
                                    dtypes.float32,
                                    name="num_updates")
        decay = math_ops.minimum(decay,
                                 (1.0 + num_updates) / (10.0 + num_updates))
      updates = []
      for var in var_list:
        zero_debias = self._averages[var] in zero_debias_true
        updates.append(assign_moving_average(
            self._averages[var], var, decay, zero_debias=zero_debias))
      return control_flow_ops.group(*updates, name=scope)

  def average(self, var):
    """Returns the `Variable` holding the average of `var`.

    Args:
      var: A `Variable` object.

    Returns:
      A `Variable` object or `None` if the moving average of `var`
      is not maintained.
    """
    return self._averages.get(var, None)

  def average_name(self, var):
    """Returns the name of the `Variable` holding the average for `var`.

    The typical scenario for `ExponentialMovingAverage` is to compute moving
    averages of variables during training, and restore the variables from the
    computed moving averages during evaluations.

    To restore variables, you have to know the name of the shadow variables.
    That name and the original variable can then be passed to a `Saver()` object
    to restore the variable from the moving average value with:
      `saver = tf.train.Saver({ema.average_name(var): var})`

    `average_name()` can be called whether or not `apply()` has been called.

    Args:
      var: A `Variable` object.

    Returns:
      A string: The name of the variable that will be used or was used
      by the `ExponentialMovingAverage class` to hold the moving average of
      `var`.
    """
    if var in self._averages:
      return self._averages[var].op.name
    return ops.get_default_graph().unique_name(
        var.op.name + "/" + self._name, mark_as_used=False)

  def variables_to_restore(self, moving_avg_variables=None):
    """Returns a map of names to `Variables` to restore.

    If a variable has a moving average, use the moving average variable name as
    the restore name; otherwise, use the variable name.

    For example,

    ```python
      variables_to_restore = ema.variables_to_restore()
      saver = tf.train.Saver(variables_to_restore)
    ```

    Below is an example of such mapping:

    ```
      conv/batchnorm/gamma/ExponentialMovingAverage: conv/batchnorm/gamma,
      conv_4/conv2d_params/ExponentialMovingAverage: conv_4/conv2d_params,
      global_step: global_step
    ```
    Args:
      moving_avg_variables: a list of variables that require to use of the
        moving variable name to be restored. If None, it will default to
        variables.moving_average_variables() + variables.trainable_variables()

    Returns:
      A map from restore_names to variables. The restore_name can be the
      moving_average version of the variable name if it exist, or the original
      variable name.
    """
    name_map = {}
    if moving_avg_variables is None:
      # Include trainable variables and variables which have been explicitly
      # added to the moving_average_variables collection.
      moving_avg_variables = variables.trainable_variables()
      moving_avg_variables += variables.moving_average_variables()
    # Remove duplicates
    moving_avg_variables = set(moving_avg_variables)
    # Collect all the variables with moving average,
    for v in moving_avg_variables:
      name_map[self.average_name(v)] = v
    # Make sure we restore variables without moving average as well.
    for v in list(set(variables.global_variables()) - moving_avg_variables):
      if v.op.name not in name_map:
        name_map[v.op.name] = v
    return name_map

附录2：移动平均法相关知识（转）

来源地址：http://wiki.mbalib.com/wiki/%E7%A7%BB%E5%8A%A8%E5%B9%B3%E5%9D%87%E6%B3%95

　移动平均法又称滑动平均法、滑动平均模型法（Moving average，MA）

什么是移动平均法?

　　移动平均法是用一组最近的实际数据值来预测未来一期或几期内公司产品的需求量、公司产能等的一种常用方法。移动平均法适用于即期预测。当产品需求既不快速增长也不快速下降，且不存在季节性因素时，移动平均法能有效地消除预测中的随机波动，是非常有用的。移动平均法根据预测时使用的各元素的权重不同

　　移动平均法是一种简单平滑预测技术，它的基本思想是：根据时间序列资料、逐项推移，依次计算包含一定项数的序时平均值，以反映长期趋势的方法。因此，当时间序列的数值由于受周期变动和随机波动的影响，起伏较大，不易显示出事件的发展趋势时，使用移动平均法可以消除这些因素的影响，显示出事件的发展方向与趋势（即趋势线），然后依趋势线分析预测序列的长期趋势。

移动平均法的种类

移动平均法可以分为：简单移动平均和加权移动平均。

　　一、简单移动平均法

　　简单移动平均的各元素的权重都相等。简单的移动平均的计算公式如下： Ft ＝（ At-1 ＋ At-2 ＋ At-3 ＋ … ＋ At-n ） /n 式中，

　　 · Ft-- 对下一期的预测值；

　　 · n-- 移动平均的时期个数；

　　 · At-1-- 前期实际值；

　　 · At-2 ， At-3 和 At-n 分别表示前两期、前三期直至前 n 期的实际值。

　　二、加权移动平均法

　　加权移动平均给固定跨越期限内的每个变量值以不同的权重。其原理是：历史各期产品需求的数据信息对预测未来期内的需求量的作用是不一样的。除了以 n 为周期的周期性变化外，远离目标期的变量值的影响力相对较低，故应给予较低的权重。加权移动平均法的计算公式如下：

　　 Ft ＝ w1At-1 ＋ w2At-2 ＋ w3At-3 ＋ … ＋ wnAt-n 式中，

　　 · w1-- 第 t-1 期实际销售额的权重；

　　 · w2-- 第 t-2 期实际销售额的权重；

　　 · wn-- 第 t-n 期实际销售额的权

　　 · n-- 预测的时期数； w1 ＋ w2 ＋ … ＋ wn ＝ 1

　　在运用加权平均法时，权重的选择是一个应该注意的问题。经验法和试算法是选择权重的最简单的方法。一般而言，最近期的数据最能预示未来的情况，因而权重应大些。例如，根据前一个月的利润和生产能力比起根据前几个月能更好的估测下个月的利润和生产能力。但是，如果数据是季节性的，则权重也应是季节性的。

移动平均法的优缺点

　　使用移动平均法进行预测能平滑掉需求的突然波动对预测结果的影响。但移动平均法运用时也存在着如下问题：

　　 1 、加大移动平均法的期数（即加大 n 值）会使平滑波动效果更好，但会使预测值对数据实际变动更不敏感；

　　 2 、移动平均值并不能总是很好地反映出趋势。由于是平均值，预测值总是停留在过去的水平上而无法预计会导致将来更高或更低的波动；

　　 3 、移动平均法要由大量的过去数据的记录。

移动平均法案例分析

　　案例一：移动平均法在公交运行时间预测中的应用

　　公交车运行时间原始数据的采集采用的是人工测试法，即由记录人员从起始点到终点跟踪每辆客车，并记录下车辆在每个站点之间的运行时间。行驶路线选用的是长春公交 306 路，始发站为长春大学，终点站为火车站。数据采集的日期是从 2001 年 4 月 3 日到 4 月 5 日。这三天属工作日，因为公交运行时间因时间的不同而有不同的结果。所以这些数据只作为预测工作日运行时间。采集的数据是该路从工农广场站点到桂林路站点之间的运行时间。

　　（ 1 ） N 取 3-20 ，利用移动平均法预测得到的结果见表 1 。

　　移动平均法预测表

	K	N
	K	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
6 ： 40	15	5
6 ： 41	16	5	5
6 ： 41	17	4	4	4
6 ： 42	18	4	4	4	4
6 ： 43	19	4	4	4	4	4
6 ： 44	20	4	4	4	4	4	4
6 ： 45	21	4	4	4	4	4	4	4
6 ： 4622	4	4	4	4	4	4	4	4
6 ： 47	22	4	4	4	4	4	4	4	4	4
6 ： 48	23	4	4	4	4	4	4	4	4	4	4
6 ： 49	24	5	4	4	4	4	4	4	4	4	4	4
6 ： 50	25	5	5	5	4	4	4	4	4	4	4	4	4
6 ： 51	26	5	5	5	5	5	4	4	4	4	4	4	4
6 ： 52	27	5	5	5	5	5	5	5	4	4	4	4	4	4
6 ： 53	28	5	5	5	5	5	5	5	5	5	5	5	5	5	5
6 ： 54	29	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
6 ： 55	30	6	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
6 ： 56	31	6	6	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
6 ： 57	32	6	6	6	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
6 ： 58	33	6	6	6	6	5	5	5	5	5	5	5	5	5	5	5	5	5	5
6 ： 59	34	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 00	35	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 01	36	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 02	37	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 03	38	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 04	39	4	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 05	40	4	4	4	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 06	41	4	4	4	4	4	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 07	42	4	4	4	4	4	4	4	5	5	5	5	5	5	5	5	5	5	5
7 ： 08	43	4	4	4	4	4	4	4	4	4	5	5	5	5	5	5	5	5	5
7 ： 09	44	4	4	4	4	4	4	4	4	4	4	4	5	5	5	5	5	5	5
7 ： 10	45	4	4	4	4	4	4	4	4	4	4	4	4	4	5	5	5	5	5
7 ： 11	46	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	5	5	5
7 ： 12	47	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	5
7 ： 13	48	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
7 ： 14	49	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
7 ： 15	50	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
7 ： 16	51	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
7 ： 17	52	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
7 ： 18	53	5	5	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
7 ： 19	54	5	5	5	5	5	4	4	4	4	4	4	4	4	4	4	4	4	4
7 ： 20	55	5	5	5	5	5	5	5	4	4	4	4	4	4	4	4	4	4	4
7 ： 21	56	5	5	5	5	5	5	5	5	5	4	4	4	4	4	4	4	4	4
7 ： 22	57	5	5	5	5	5	5	5	5	5	5	5	4	4	4	4	4	4	4
7 ： 23	58	5	5	5	5	5	5	5	5	5	5	5	5	4	4	4	4	4	4
7 ： 24	59	5	5	5	5	5	5	5	5	5	5	5	5	5	5	4	4	4	4
7 ： 25	60	5	5	5	5	5	5	5	5	5	5	5	5	5	5	4	4	4	4
7 ： 26	61	4	4	5	5	5	5	5	5	5	5	5	5	5	5	4	4	4	4
7 ： 27	62	4	4	4	4	5	5	5	5	5	5	5	5	5	5	4	4	4	4
7 ： 28	63	4	4	4	4	4	5	5	5	5	5	5	5	5	5	5	4	4	4
7 ： 29	64	4	4	4	4	4	4	5	5	5	5	5	5	5	5	5	5	4	4
7 ： 30	65	5	4	4	4	4	4	4	5	5	5	5	5	5	5	5	5	5	4
7 ： 31	66	5	4	4	4	4	4	4	5	5	5	5	5	5	5	5	5	5	5
7 ： 32	67	5	5	5	4	4	4	4	4	4	5	5	5	5	5	5	5	5	5
7 ： 33	68	5	5	5	5	4	4	4	4	4	4	5	5	5	5	5	5	5	5
7 ： 34	69	5	5	5	5	5	4	4	4	4	4	4	5	5	5	5	5	5	5
7 ： 35	70	5	5	5	5	5	5	4	4	4	4	4	4	5	5	5	5	5	5
7 ： 36	71	5	5	5	5	5	5	5	4	4	4	4	4	4	5	5	5	5	5
7 ： 37	72	5	5	5	5	5	5	5	5	4	4	4	4	4	4	5	5	5	5
7 ： 38	73	5	5	5	5	5	5	5	5	5	4	4	4	4	4	4	5	5	5
7 ： 39	74	5	5	5	5	5	5	5	5	5	5	4	4	4	4	4	4	5	5
7 ： 40	75	5	5	5	5	5	5	5	5	5	5	5	4	4	4	4	4	4	5
7 ： 41	76	5	5	5	5	5	5	5	5	5	5	5	5	5	4	4	4	4	5
7 ： 42	77	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 43	78	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 44	79	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 45	80	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 46	81	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 47	82	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 48	83	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 49	84	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 50	85	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 51	86	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 52	87	4	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 53	88	4	4	4	5	5	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 54	89	4	4	4	4	5	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 55	90	4	4	4	4	4	5	5	5	5	5	5	5	5	5	5	5	5	5
7 ： 56	91	4	4	4	4	4	4	4	5	5	5	5	5	5	5	5	5	5	5
7 ： 57	92	4	4	4	4	4	4	4	4	4	5	5	5	5	5	5	5	5	5
7 ： 58	93	4	4	4	4	4	4	4	4	4	4	4	5	5	5	5	5	5	5
7 ： 59	94	4	4	4	4	4	4	4	4	4	4	4	4	5	5	5	5	5	5
8 ： 00	95	4	4	4	4	4	4	4	4	4	4	4	4	4	4	5	5	5	5
8 ： 01	96	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	5	5	5
8 ： 02	97	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	5
8 ： 03	98	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 04	99	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 05	100	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 06	101	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 07	102	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 08	103	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 09	104	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 10	105	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 11	106	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 12	107	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 13	108	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 14	109	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 15	110	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 16	111	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 17	112	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 18	113	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 19	114	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 20	115	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 21	116	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 22	117	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 23	118	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4
8 ： 24	119	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4	4

　　（ 2 ） N 取 3 ～ 20 ，得到的预测结果图形见图。

　　说明：横坐标代表时间刻度，纵坐标代表所用时间（即预测时间）；由于横坐标时间刻度是一分钟，所以无法体现每一刻度值，纵坐标刻度是 2 、 4 、 6 、 8 ，单位是分钟。其坐标的顶点坐标是（ 6 ： 38 ， 2 ）。

　　由预测结果图形可以看出，当 N 的取值不同，所形成的曲线形状大致相同，只是 N 的取值越大其形成的曲线就相对于前一 N 值所形成的曲线有一个滞后偏差，因为 N 每增加一次，做移动平均值预测时就忽略了其对应单位时间序列的数据值，因此有这一现象。

　　（ 3 ） N 取 3 ～ 20 一次移动平均法工作日误差指标如表 2 。

　　一次移动平均法工作日误差指标

N 值	3	4	5	6	7	8	9	10	11
相对误差	0.1424	01457	01389	0.1321	0.1502	0.1511	0.1478	0.1400	0.1455
N 值	12	13	14	15	16	17	18	19	20
相对误差	0.1428	0.1409	0.1500	0.1510	0.1423	0.1470	0.1523	0.1655	0.1620

　　其中，相对误差＝ 1/N

||/

。

　　由上表可以看出，当预测日期为工作日时，相对误差最小的是 N ＝ 6 时预测所得的数据。所以认为该参数最合适，并可作为工农广场到桂林路站点之间公交车行程时间的预测依据。

　　案例二：简单移动平均法在房地产中的运用^[2]

　　某类房地产 2001 年各月的价格如下表中第二列所示。由于各月的价格受某些不确定因素的影响，时高时低，变动较大。如果不予分析，不易显现其发展趋势。如果把每几个月的价格加起来计算其移动平均数，建立一个移动平均数时间序列，就可以从平滑的发展趋势中明显地看出其发展变动的方向和程度，进而可以预测未来的价格。

　　在计算移动平均数时，每次应采用几个月来计算，需要根据时间序列的序数和变动周期来决定。如果序数多，变动周期长，则可以采用每 6 个月甚至每 12 个月来计算；反之，可以采用每 2 个月或每 5 个月来计算。对本例房地产 2001 年的价格，采用每 5 个月的实际值计算其移动平均数。计算方法是：把 1 ～ 5 月的价格加起来除以 5 得 684 元 / 平方米，把 2 ～ 6 月的价格加起来除以 5 得 694 元 / 平方米，把 3 ～ 7 月的价格加起来除以 5 得 704 元 / 平方米，依此类推，见表中第三列。再根据每 5 个月的移动平均数计算其逐月的上涨额，见表中第四列。

表某类房地产 2001 年各月的价格（元 / 平方米）
月份	房地产价格实际值	每 5 个月的移动平均数的	移动平均数逐月上涨额
1	670
2	680
3	690	684
4	680	694	10
5	700	704	10
6	720	714	10
7	730	726	12
8	740	738	12
9	740	750	12
10	760	762	12
11	780
12	790

　　假如需要预测该类房地产 2002 年 1 月的价格，则计算方法如下：由于最后一个移动平均数 762 与 2002 年 1 月相差 3 个月，所以预测该类房地产 2002 年 1 月的价格为： 762 ＋ 12 × 3 - 798 （元 / 平方米）

　　案例三：加权移动平均法在计算销售额中的运用

　　某商场 1 月份至 11 月份的实际销售额如表所示。假定跨越期为 3 个月，权数为 l 、 2 、 3 ，试用加权移动平均法预测 12 月份的销售额。

表加权移动平均值计算表单位：万元
月份	销售额	3 个月的加权移动平均
1	38
2	45
3	35
4	49	38.83
5	70	43.67
6	43	57.17
7	46	53.00
8	55	49.00
9	45	50.00
10	68	48.5
11	64	58.17
12		62.17

　　解：

＝ 38.83 （万元）

＝ 43.67 （万元）

　　 ……

＝ 62.17 （万元）

tz_zs

关注

3
点赞
踩
12

收藏

觉得还不错? 一键收藏
0
评论
tensorflow 滑动平均模型 ExponentialMovingAverage

____tz_zs学习笔记滑动平均模型对于采用GradientDescent或Momentum训练的神经网络的表现都有一定程度上的提升。原理：在训练神经网络时，不断保持和更新每个参数的滑动平均值，在验证和测试时，参数的值使用其滑动平均值，能有效提高神经网络的准确率。tf.train.ExponentialMovingAveragetensorflow官网地址：h
复制链接

扫一扫