斯坦福cs231n课程记录——assignment2 BatchNormalization

最新推荐文章于 2021-07-19 16:24:39 发布

临江轩

最新推荐文章于 2021-07-19 16:24:39 发布

阅读量3.3k

点赞数 5

分类专栏：实践文章标签： BatchNormalization

本文链接：https://blog.csdn.net/weixin_39880579/article/details/86773600

版权

本文详细介绍了Stanford cs231n课程中的BatchNormalization原理、实现与应用。BatchNormalization用于加速网络训练，减少内部协变量漂移，通常置于全连接层或卷积层后，激活函数前。它依赖于批次大小，可能导致小批次训练时准确性下降。相比之下，Layer Normalization对批次大小不敏感，适用于特征维度较小的场景。实验表明，BatchNormalization有助于网络更快收敛，而Layer Normalization在某些情况下是BatchNormalization的有效替代。

摘要由CSDN通过智能技术生成

一、BatchNormalization原理

先敬大佬的一篇文章《详解深度学习中的Normalization，BN/LN/WN》

运用：to make each dimension zero-mean unit-variance.

算法：

（最后需要scale and shift 是因为上一步进行零均值单位方差化后数据都在0周围，这样特征是比较难学的，所以需要重新缩放和平移，使其趋于一个真实的分布）

公式： $\hat{x}^{(k)} =\frac{x^{(k)} - E[x^{(k)}]}{ \sqrt{Var[x^{(k)}]}}$

位置：Usually inserted after Fully Connected or Convolutional layers, and before nonlinearity.

（From : cs231n_2018_lecture06)

三、BatchNormalization实现

算法：

正向传播：

def batchnorm_forward(x, gamma, beta, bn_param):
    """
    Input:
    - x: Data of shape (N, D)
    - gamma: Scale parameter of shape (D,)
    - beta: Shift paremeter of shape (D,)
    - bn_param: Dictionary with the following keys:
      - mode: 'train' or 'test'; required
      - eps: Constant for numeric stability
      - momentum: Constant for running mean / variance.
      - running_mean: Array of shape (D,) giving running mean of features
      - running_var Array of shape (D,) giving running variance of features

    Returns a tuple of:
    - out: of shape (N, D)
    - cache: A tuple of values needed in the backward pass
    """
    mode = bn_param['mode']
    eps = bn_param.get('eps', 1e-5)
    momentum = bn_param.get('momentum', 0.9)
    N, D = x.shape
    running_mean = bn_param.get('running_mean', np.zeros(D, dtype=x.dtype))
    running_var = bn_param.get('running_var', np.zeros(D, dtype=x.dt