最近从caffe转到tensorflow,突然发现batch normalization 参数变多了,于是本着遇到什么问题解决什么问题的原则,去搜了搜怎么回事。
Caffe
Caffe中的BN层参数:
message BatchNormParameter {
// If false, normalization is performed over the current mini-batch
// and global statistics are accumulated (but not yet used) by a moving
// average.
// If true, those accumulated mean and variance values are used for the
// normalization.
// By default, it is set to false when the network is in the training
// phase and true when the network is in the testing phase.
optional bool use_global_stats = 1;
// What fraction of the moving average remains each iteration?
// Smaller values make the moving average decay faster, giving more
// weight to the recent values.
// Each iteration updates the moving average @f$S_{t-1}@f$ with the
// current mean @f$ Y_t @f$ by
// @f$ S_t = (1-\beta)Y_t + \beta \cdot S_{t-1} @f$, where @f$ \beta @f$
// is the moving_average_fraction parameter.
optional float moving_average_fraction = 2 [default = .999];
// Small value to add to the variance estimate so that we don't divide by
// zero.
optional float eps = 3 [default = 1e-5];
}
tensorflow
tensorflow中的BN层参数:
batch_normalization(
x,
mean,
variance,
offset,
scale,
variance_epsilon,
name=None
)
多的参数是什么
原本以为BN只是个将tensor标准化的一个过程,为了搞清楚额外的参数是什么,我找到了BN的原始文章:
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.
在论文中提到,针对一个特征维度,简化后、基于mini-batch的BN操作为:
μ=1m∑