batch normalization 中的 moving_mean与moving_variance理解

最新推荐文章于 2024-07-29 01:00:23 发布

imumu_xi

最新推荐文章于 2024-07-29 01:00:23 发布

阅读量5.4k

点赞数 2

分类专栏：机器学习 tensorflow 文章标签： batch normalization tensorflow deep learning

本文链接：https://blog.csdn.net/sinat_30372583/article/details/79943743

版权

机器学习同时被 2 个专栏收录

13 篇文章 0 订阅

订阅专栏

tensorflow

9 篇文章 0 订阅

订阅专栏

batch normalization在训练部分代码时看到下面这一行

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
  with tf.control_dependencies(update_ops):
    train_op = optimizer.minimize(loss, global_step)

************************************************************
这里先介绍tf.control_dependencies的作用

在有些机器学习程序中我们想要指定某些操作执行的依赖关系，这时我们可以使用tf.control_dependencies()来实现。

control_dependencies(control_inputs)返回一个控制依赖的上下文管理器，使用with关键字可以让在这个上下文环境中的操作都在control_inputs 执行。

with g.control_dependencies([a, b, c]):
# `d` and `e` will only run after `a`, `b`, and `c` have executed.
d = ...

这里给出两个对这一函数的理解

https://blog.csdn.net/pku_jade/article/details/73498753

https://blog.csdn.net/u012436149/article/details/72084744

*************************************************************

回到正文

我定义网络结构中用的是tf.layers.batch_normalization，查看它的api，发现一个注释：note: when training, the moving_mean andmoving_variance need to be updated. By default the update ops are placed in tf.GraphKeys.UPDATE_OPS, so they need to be added as a dependency to the train_op

这里就需要进一步思考为什么会这样，先说一下batch normalization原论文中对moving average的说明：在原论文中，训练过程中均值和方差所求的是每一个mini batch的均值方差，而在测试过程中，并不是求整个测试样本的均值方差，而是在之前训练的过程中求所有batch的均值的期望，方差的期望。然后进行测试推理，详细解释如下: