目录
- BatchNormalization原理
- BatchNormalization实现
- BatchNormalization运用
- Layer Normalization
- 参考文献
一、BatchNormalization原理
先敬大佬的一篇文章《详解深度学习中的Normalization,BN/LN/WN》
运用:to make each dimension zero-mean unit-variance.
算法:
(最后需要scale and shift 是因为上一步进行零均值单位方差化后数据都在0周围,这样特征是比较难学的,所以需要重新缩放和平移,使其趋于一个真实的分布)
公式:
位置:Usually inserted after Fully Connected or Convolutional layers, and before nonlinearity.
(From : cs231n_2018_lecture06)
三、BatchNormalization实现
算法:
正向传播:
def batchnorm_forward(x, gamma, beta, bn_param):
"""
Input:
- x: Data of shape (N, D)
- gamma: Scale parameter of shape (D,)
- beta: Shift paremeter of shape (D,)
- bn_param: Dictionary with the following keys:
- mode: 'train' or 'test'; required
- eps: Constant for numeric stability
- momentum: Constant for running mean / variance.
- running_mean: Array of shape (D,) giving running mean of features
- running_var Array of shape (D,) giving running variance of features
Returns a tuple of:
- out: of shape (N, D)
- cache: A tuple of values needed in the backward pass
"""
mode = bn_param['mode']
eps = bn_param.get('eps', 1e-5)
momentum = bn_param.get('momentum', 0.9)
N, D = x.shape
running_mean = bn_param.get('running_mean', np.zeros(D, dtype=x.dtype))
running_var = bn_param.get('running_var', np.zeros(D, dtype=x.dt