[CS231n Assignment 2 #02 ] 批量归一化[BatchNormalization]

最新推荐文章于 2022-07-26 17:10:44 发布

灵隐寺扫地僧

最新推荐文章于 2022-07-26 17:10:44 发布

阅读量1.5k

点赞数 1

分类专栏： # CS231n 文章标签：计算机视觉

本文链接：https://blog.csdn.net/qq_41341454/article/details/105454191

版权

文章目录

作业介绍

作业主页：Assignment 2
作业目的：为了使深度神经网络更好的得到训练，一个方案是使用更复杂的优化方法：SGD+Momentum，Adam，RMSProp等。另一个方案就是改变网络结构，比如我们这节要完成的批量归一化。
官方示例代码： Assignment 2 code
作业源文件 BatchNormlization.ipynb

1. 批量归一化（Batch Normalization）

当输入数据是 不相关（uncorrelated）、零均值（zero mean） 以及 单元方差(unit variance) 的时候，我们的机器学习方法往往表现得很好。但是，当我们训练深度神经网络的时候，即便我们预处理数据使得输入数据服从这样的分布，不断的网络层的处理也会使得原始分布发生改变。更严重得使，随着权重得不断更新，每一层得输入特征的分布也会不断地发生漂移。
所以，推荐阅读1中的作者假设，输入特征分布的漂移会使得深度神经网络的训练变得困难，从而提出插入一个 批量归一化 层来处理这个问题。
在训练阶段，我们用一个小批量的数据来估计 每一个特征维度的均值和方差 ，并用它来处理我们输入的小批量数据，使得它们零均值和去相关化。同时，我们会维护一个训练集上得平均均值和方差，用来在测试集上处理数据。
但是，这样得BN层或许会因为改变的输入特的分布而影响网络的表达能力，即对于某些网络层，非零均值和单元方差的数据分布可能会更好。所以，对于每一个BN层，我们会学习一个 漂移因子（Shift）和尺度变化因子(scale) 来适当的恢复每一个特征维度的分布，使得其不是严格服从我们得标准分布，这样增加网络的丰富性。

1.1 BN层的前向传播(forward)

In the file cs231n/layers.py, implement the batch normalization forward pass in the function batchnorm_forward. Once you have done so, run the following to test your implementation.

def batchnorm_forward(x, gamma, beta, bn_param):
    """
    Forward pass for batch normalization.

    During training the sample mean and (uncorrected) sample variance are
    computed from minibatch statistics and used to normalize the incoming data.
    During training we also keep an exponentially decaying running mean of the
    mean and variance of each feature, and these averages are used to normalize
    data at test-time.

    At each timestep we update the running averages for mean and variance using
    an exponential decay based on the momentum parameter:

    running_mean = momentum * running_mean + (1 - momentum) * sample_mean
    running_var = momentum * running_var + (1 - momentum) * sample_var

    Note that the batch normalization paper suggests a different test-time
    behavior: they compute sample mean and variance for each feature using a
    large number of training images rather than using a running average. For
    this implementation we have chosen to use running averages instead since
    they do not require an additional estimation step; the torch7
    implementation of batch normalization also uses running averages.

    Input:
    - x: Data of shape (N, D)
    - gamma: Scale parameter of shape (D,)
    - beta: Shift paremeter of shape (D,)
    - bn_param: Dictionary with the following keys:
      - mode: 'train' or 'test'; required
      - eps: Constant for numeric stability
      - momentum: Constant for running mean / variance.
      - running_mean: Array of shape (D,) giving running mean of features
      - running_var Array of shape (D,) giving running variance of features

    Returns a tuple of:
    - out: of shape (N, D)
    - cache: A tuple of values needed in the backward pass
    """
    mode = bn_param['mode']
    eps = bn_param.get('eps', 1e-5)
    momentum = bn_param.get('momentum', 0.9)

    N, D = x.shape
    running_mean = bn_param.get('running_mean', np.zeros(D, dtype=x.dtype))
    running_var = bn_param.get('running_var', np.zeros(D, dtype=x.dtype))

    out, cache = None, None
    if mode == 'train':
        batch_mean = np.mean(x,axis = 0)
        batch_var = np.var(x, axis = 0)
        # 存储训练时候的均值和方差
        running_mean = momentum * running_mean + (1 - momentum) * batch_mean
        running_var = momentum * running_var + (1 - momentum) * batch_var
        x_std = (x - batch_mean ) / (np.sqrt(batch_var) + eps)
        out = gamma * x_std + beta
        cache = [gamma, x_std, beta, 1 / (np.sqrt(batch_var) + eps)]
    elif mode == 'test':
        #######################################################################
        # TODO: Implement the test-time forward pass for batch normalization. #
        # Use the running mean and variance to normalize the incoming data,   #
        # then scale and shift the normalized data using gamma and beta.      #
        # Store the result in the out variable.                               #
        #######################################################################
        x_std = (x - bn_param['running_mean']) / (np.sqrt(bn_param['running_var']) + eps)
        out = gamma * x_std + beta
    else:
        raise ValueError('Invalid forward batchnorm mode "%s"' % mode)

    # Store the updated running means back into bn_param
    bn_param['running_mean'] = running_mean
    bn_param['running_var'] = running_var

    return out, cache

1.2 BN层的反向传播

Now implement the backward pass for batch normalization in the function batchnorm_backward.
To derive the backward pass you should write out the computation graph for batch normalization and backprop through each of the intermediate nodes. Some intermediates may have multiple outgoing branches; make sure to sum gradients across these branches in the backward pass.

 batchnorm_backward(dout, cache):
    """
    Backward pass for batch normalization.

    For this implementation, you should write out a computation graph for
    batch normalization on paper and propagate gradients backward through
    intermediate nodes.

    Inputs:
    - dout: Upstream derivatives, of shape (N, D)
    - cache: Variable of intermediates from batchnorm_forward.

    Returns a tuple of:
    - dx: Gradient with respect to inputs x, of shape (N, D)
    - dgamma: Gradient with respect to scale parameter gamma, of shape (D,)
    - dbeta: Gradient with respect to shift parameter beta, of shape (D,)
    """
    gamma, x_std, beta,x,batch_mean, batch_var, eps = cache
    N = x.shape[0]
    # out = gamma * x_std + beta
    dbeta = np.sum(dout, axis = 0)
    dgamma = np.sum(dout * x_std, axis = 0)
    dx_std = dout * gamma

    # x_std = (x - mean) / 标准差
    # 此时注意x有多个输出，包括直接输出，方差输出和均值输出
    # 所以计算图中有多条边流向x
    a = np.sqrt(batch_var + eps)
    # 先计算方差
    dvar = np.sum( - 0.5 * (x - batch_mean) * dx_std / a ** 3 , axis = 0)
    dmean = np.sum( - dx_std / a, axis=0) + dvar * np.sum(-2 * (x - batch_mean), axis=0) / N
    dx = dx_std / a + dmean / N + 2 * dvar * (x - batch_mean) / N
    return dx, dgamma, dbeta

我们反向求导的时候，可以画出计算图逐结点求导；也可以直接推导输出对输入的倒数，然后一步到位，可能要快一点：

def batchnorm_backward_alt(dout, cache):
  """
    Alternative backward pass for batch normalization.

    For this implementation you should work out the derivatives for the batch
    normalizaton backward pass on paper and simplify as much as possible. You
    should be able to derive a simple expression for the backward pass. 
    See the jupyter notebook for more hints.
     
    Note: This implementation should expect to receive the same cache variable
    as batchnorm_backward, but might not use all of the values in the cache.

    Inputs / outputs: Same as batchnorm_backward
    """
    gamma, x_std, beta, x, batch_mean, batch_var, eps = cache
    ###########################################################################
    # TODO: Implement the backward pass for batch normalization. Store the    #
    # results in the dx, dgamma, and dbeta variables.                         #
    #                                                                         #
    # After computing the gradient with respect to the centered inputs, you   #
    # should be able to compute gradients with respect to the inputs in a     #
    # single statement; our implementation fits on a single 80-character line.#
    ###########################################################################
    N = x.shape[0]
    # 先计算变化因子，好计算一点
    dgamma = np.sum(dout * x_std, axis = 0)
    dbeta = np.sum(dout, axis = 0)
    # 再计算对x的梯度
    a = 1 / np.sqrt(batch_var + eps)
    dx_hat = dout * gamma
    #dvar = np.sum(dx_hat * (x - batch_mean) * (-0.5) * (a ** 3), axis = 0)
    #dmean = np.sum(- dx_hat * a, axis = 0) #+ dvar * (-2 / N) * np.sum(x - batch_mean, axis = 0) #后面这项为0
    dx = dx_hat * a + np.sum(dx_hat * (x - batch_mean) * (-0.5) * (a ** 3), axis = 0) * 2 * (x - batch_mean) / N + np.sum(- dx_hat * a, axis = 0) / N
    return dx, dgamma, dbeta

1.3 Fully Connected Nets with Batch Normalization

Now that you have a working implementation for batch normalization, go back to your FullyConnectedNet in the file cs231n/classifiers/fc_net.py. Modify your implementation to add batch normalization.
即在我们之前实现的FC神经网络上添加BN层
You might find it useful to define an additional helper layer similar to those in the file cs231n/layer_utils.py.

第一步，先在layer_utils.py中新定义我们的affine->bn->relu网络层

def affine_bn_relu_forward(x,w,b,gamma,beta,bn_params):
    """
       Convenience layer that perorms an affine WITH BACTHNORMALIZATION transform followed by a ReLU
       Inputs:
       - x: Input to the affine layer
       - w, b: Weights for the affine layer
       Returns a tuple of:
       - out: Output from the ReLU
       - cache: Object to give to the backward pass
       - gamma: Scale parameter of shape (D,)
       - beta: Shift paremeter of shape (D,)
       - bn_param: Dictionary with the following keys:
         - mode: 'train' or 'test'; required
         - eps: Constant for numeric stability
         - momentum: Constant for running mean / variance.
         - running_mean: Array of shape (D,) giving running mean of features
         - running_var Array of shape (D,) giving running variance of features

       Returns a tuple of:
       - out: Output from the ReLU
       - cache: Object to give to the backward pass
       """
    a_fc, fc_cache = affine_forward(x,w,b)
    a_bn,bn_cache = batchnorm_forward(a_fc,gamma,beta,bn_params)
    out,relu_cache = relu_forward(a_bn)
    cache = (fc_cache, bn_cache, relu_cache)
    return out, cache
def affine_bn_relu_backward(dout, cache):
    """
    Backward pass for the affine-bn-relu convenience layer
    """
    fc_cache, bn_cache, relu_cache = cache
    da = relu_backward(dout, relu_cache)
    da_bn, dgamma,dbeta = batchnorm_backward(da,bn_cache)
    dx,dw,db = affine_backward(da_bn,fc_cache)
    return dx,dw,db,dgamma,dbeta

然后完成我们之前的fc_net.py

from builtins import range
from builtins import object
import numpy as np

from cs231n.layers import

最低0.47元/天解锁文章

灵隐寺扫地僧

关注

1
点赞
踩
13

收藏

觉得还不错? 一键收藏
0
评论
[CS231n Assignment 2 #02 ] 批量归一化[BatchNormalization]

文章目录作业介绍1. 批量归一化（Batch Normalization）1.1 BN层的前向传播(forward)1.2 BN层的反向传播1.3 Fully Connected Nets with Batch Normalization2. Batchnorm for deep networks3. Batch normalization and initialization4. Batch ...
复制链接

扫一扫