[CS231n Assignment 2 #02 ] 批量归一化[BatchNormalization]

作业介绍
  • 作业主页:Assignment 2
  • 作业目的:为了使深度神经网络更好的得到训练,一个方案是使用更复杂的优化方法:SGD+Momentum,Adam,RMSProp等。另一个方案就是改变网络结构,比如我们这节要完成的批量归一化。
  • 官方示例代码: Assignment 2 code
  • 作业源文件 BatchNormlization.ipynb
1. 批量归一化(Batch Normalization)

当输入数据是 不相关(uncorrelated)零均值(zero mean) 以及 单元方差(unit variance) 的时候,我们的机器学习方法往往表现得很好。但是,当我们训练深度神经网络的时候,即便我们预处理数据使得输入数据服从这样的分布,不断的网络层的处理也会使得原始分布发生改变。更严重得使,随着权重得不断更新,每一层得输入特征的分布也会不断地发生漂移。
所以,推荐阅读1中的作者假设,输入特征分布的漂移会使得深度神经网络的训练变得困难,从而提出插入一个 批量归一化 层来处理这个问题。
在训练阶段, 我们用一个小批量的数据来估计 每一个特征维度的均值和方差 ,并用它来处理我们输入的小批量数据,使得它们零均值和去相关化。同时,我们会维护一个训练集上得平均均值和方差,用来在测试集上处理数据。
但是,这样得BN层或许会因为改变的输入特的分布而影响网络的表达能力,即对于某些网络层,非零均值和单元方差的数据分布可能会更好。所以,对于每一个BN层,我们会学习一个 漂移因子(Shift)和尺度变化因子(scale) 来适当的恢复每一个特征维度的分布,使得其不是严格服从我们得标准分布,这样增加网络的丰富性。

1.1 BN层的前向传播(forward)
  • In the file cs231n/layers.py, implement the batch normalization forward pass in the function batchnorm_forward. Once you have done so, run the following to test your implementation.
    在这里插入图片描述
def batchnorm_forward(x, gamma, beta, bn_param):
    """
    Forward pass for batch normalization.

    During training the sample mean and (uncorrected) sample variance are
    computed from minibatch statistics and used to normalize the incoming data.
    During training we also keep an exponentially decaying running mean of the
    mean and variance of each feature, and these averages are used to normalize
    data at test-time.

    At each timestep we update the running averages for mean and variance using
    an exponential decay based on the momentum parameter:

    running_mean = momentum * running_mean + (1 - momentum) * sample_mean
    running_var = momentum * running_var + (1 - momentum) * sample_var

    Note that the batch normalization paper suggests a different test-time
    behavior: they compute sample mean and variance for each feature using a
    large number of training images rather than using a running average. For
    this implementation we have chosen to use running averages instead since
    they do not require an additional estimation step; the torch7
    implementation of batch normalization also uses running averages.

    Input:
    - x: Data of shape (N, D)
    - gamma: Scale parameter of shape (D,)
    - beta: Shift paremeter of shape (D,)
    - bn_param: Dictionary with the following keys:
      - mode: 'train' or 'test'; required
      - eps: Constant for numeric stability
      - momentum: Constant for running mean / variance.
      - running_mean: Array of shape (D,) giving running mean of features
      - running_var Array of shape (D,) giving running variance of features

    Returns a tuple of:
    - out: of shape (N, D)
    - cache: A tuple of values needed in the backward pass
    """
    mode = bn_param['mode']
    eps = bn_param.get('eps', 1e-5)
    momentum = bn_param.get('momentum', 0.9)

    N, D = x.shape
    running_mean = bn_param.get('running_mean', np.zeros(D, dtype=x.dtype))
    running_var = bn_param.get('running_var', np.zeros(D, dtype=x.dtype))

    out, cache = None, None
    if mode == 'train':
        batch_mean = np.mean(x,axis = 0)
        batch_var = np.var(x, axis = 0)
        # 存储训练时候的均值和方差
        running_mean = momentum * running_mean + (1 - momentum) * batch_mean
        running_var = momentum * running_var + (1 - momentum) * batch_var
        x_std = (x - batch_mean ) / (np.sqrt(batch_var) + eps)
        out = gamma * x_std + beta
        cache = [gamma, x_std, beta, 1 / (np.sqrt(batch_var) + eps)]
    elif mode == 'test':
        #######################################################################
        # TODO: Implement the test-time forward pass for batch normalization. #
        # Use the running mean and variance to normalize the incoming data,   #
        # then scale and shift the normalized data using gamma and beta.      #
        # Store the result in the out variable.                               #
        #######################################################################
        x_std = (x - bn_param['running_mean']) / (np.sqrt(bn_param['running_var']) + eps)
        out = gamma * x_std + beta
    else:
        raise ValueError('Invalid forward batchnorm mode "%s"' % mode)

    # Store the updated running means back into bn_param
    bn_param['running_mean'] = running_mean
    bn_param['running_var'] = running_var

    return out, cache
1.2 BN层的反向传播
  • Now implement the backward pass for batch normalization in the function batchnorm_backward.
  • To derive the backward pass you should write out the computation graph for batch normalization and backprop through each of the intermediate nodes. Some intermediates may have multiple outgoing branches; make sure to sum gradients across these branches in the backward pass.
 batchnorm_backward(dout, cache):
    """
    Backward pass for batch normalization.

    For this implementation, you should write out a computation graph for
    batch normalization on paper and propagate gradients backward through
    intermediate nodes.

    Inputs:
    - dout: Upstream derivatives, of shape (N, D)
    - cache: Variable of intermediates from batchnorm_forward.

    Returns a tuple of:
    - dx: Gradient with respect to inputs x, of shape (N, D)
    - dgamma: Gradient with respect to scale parameter gamma, of shape (D,)
    - dbeta: Gradient with respect to shift parameter beta, of shape (D,)
    """
    gamma, x_std, beta,x,batch_mean, batch_var, eps = cache
    N = x.shape[0]
    # out = gamma * x_std + beta
    dbeta = np.sum(dout, axis = 0)
    dgamma = np.sum(dout * x_std, axis = 0)
    dx_std = dout * gamma

    # x_std = (x - mean) / 标准差
    # 此时注意x有多个输出,包括直接输出,方差输出和均值输出
    # 所以计算图中有多条边流向x
    a = np.sqrt(batch_var + eps)
    # 先计算方差
    dvar = np.sum( - 0.5 * (x - batch_mean) * dx_std / a ** 3 , axis = 0)
    dmean = np.sum( - dx_std / a, axis=0) + dvar * np.sum(-2 * (x - batch_mean), axis=0) / N
    dx = dx_std / a + dmean / N + 2 * dvar * (x - batch_mean) / N
    return dx, dgamma, dbeta
  • 我们反向求导的时候,可以画出计算图逐结点求导;也可以直接推导输出对输入的倒数,然后一步到位,可能要快一点:
    在这里插入图片描述
def batchnorm_backward_alt(dout, cache):
  """
    Alternative backward pass for batch normalization.

    For this implementation you should work out the derivatives for the batch
    normalizaton backward pass on paper and simplify as much as possible. You
    should be able to derive a simple expression for the backward pass. 
    See the jupyter notebook for more hints.
     
    Note: This implementation should expect to receive the same cache variable
    as batchnorm_backward, but might not use all of the values in the cache.

    Inputs / outputs: Same as batchnorm_backward
    """
    gamma, x_std, beta, x, batch_mean, batch_var, eps = cache
    ###########################################################################
    # TODO: Implement the backward pass for batch normalization. Store the    #
    # results in the dx, dgamma, and dbeta variables.                         #
    #                                                                         #
    # After computing the gradient with respect to the centered inputs, you   #
    # should be able to compute gradients with respect to the inputs in a     #
    # single statement; our implementation fits on a single 80-character line.#
    ###########################################################################
    N = x.shape[0]
    # 先计算变化因子,好计算一点
    dgamma = np.sum(dout * x_std, axis = 0)
    dbeta = np.sum(dout, axis = 0)
    # 再计算对x的梯度
    a = 1 / np.sqrt(batch_var + eps)
    dx_hat = dout * gamma
    #dvar = np.sum(dx_hat * (x - batch_mean) * (-0.5) * (a ** 3), axis = 0)
    #dmean = np.sum(- dx_hat * a, axis = 0) #+ dvar * (-2 / N) * np.sum(x - batch_mean, axis = 0) #后面这项为0
    dx = dx_hat * a + np.sum(dx_hat * (x - batch_mean) * (-0.5) * (a ** 3), axis = 0) * 2 * (x - batch_mean) / N + np.sum(- dx_hat * a, axis = 0) / N
    return dx, dgamma, dbeta
1.3 Fully Connected Nets with Batch Normalization
  • Now that you have a working implementation for batch normalization, go back to your FullyConnectedNet in the file cs231n/classifiers/fc_net.py. Modify your implementation to add batch normalization.
  • 即在我们之前实现的FC神经网络上添加BN层
  • You might find it useful to define an additional helper layer similar to those in the file cs231n/layer_utils.py.

第一步,先在layer_utils.py中新定义我们的affine->bn->relu网络层

def affine_bn_relu_forward(x,w,b,gamma,beta,bn_params):
    """
       Convenience layer that perorms an affine WITH BACTHNORMALIZATION transform followed by a ReLU
       Inputs:
       - x: Input to the affine layer
       - w, b: Weights for the affine layer
       Returns a tuple of:
       - out: Output from the ReLU
       - cache: Object to give to the backward pass
       - gamma: Scale parameter of shape (D,)
       - beta: Shift paremeter of shape (D,)
       - bn_param: Dictionary with the following keys:
         - mode: 'train' or 'test'; required
         - eps: Constant for numeric stability
         - momentum: Constant for running mean / variance.
         - running_mean: Array of shape (D,) giving running mean of features
         - running_var Array of shape (D,) giving running variance of features

       Returns a tuple of:
       - out: Output from the ReLU
       - cache: Object to give to the backward pass
       """
    a_fc, fc_cache = affine_forward(x,w,b)
    a_bn,bn_cache = batchnorm_forward(a_fc,gamma,beta,bn_params)
    out,relu_cache = relu_forward(a_bn)
    cache = (fc_cache, bn_cache, relu_cache)
    return out, cache
def affine_bn_relu_backward(dout, cache):
    """
    Backward pass for the affine-bn-relu convenience layer
    """
    fc_cache, bn_cache, relu_cache = cache
    da = relu_backward(dout, relu_cache)
    da_bn, dgamma,dbeta = batchnorm_backward(da,bn_cache)
    dx,dw,db = affine_backward(da_bn,fc_cache)
    return dx,dw,db,dgamma,dbeta
  • 然后完成我们之前的fc_net.py
from builtins import range
from builtins import object
import numpy as np

from cs231n.layers import 
  • 1
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值