CS231N-Batch Normalization(批量归一化)

为什么要使用BN

  • 神经网络的输入在经过层层网络的非线性变换之后,每次学习到的分布都无法预测。并且由于参数的更新,每层输入分布都在发生变化,导致网络很难收敛,为了让网络可以正常训练,就需要

    • 学习率不能太高

    • 每次参数初始化准确

    • 网络层数不能过多

  • 神经网络研究人员发现,这个现象是由于每层分布的差异过大,并且无法预测导致的,那么如果让每一个batch样本在每一层都服从类似的分布,就可以解决这样的问题

  • 标准化并不改变数据的分布,只是将数据在原始空间内进行平移和缩放

训练阶段的BN

我们不希望每层的分布都相同,因此我们增加了一个线性运算γ和β,γ和β是作为参数进行学习的,如果学习得当当伽马等于标准差,β等于期望时,y就是x的恒等映射

测试阶段的BN

测试阶段,我们要统计所有batch的均值和方差,然后均值采用训练集所有batch均值的期望,方差采用所有batch方差的无偏估计就可得出全局统计量

整个训练集的均值和方差是通过指数加权平均计算的,跟动量类似。统计所有批次的均值时,每次更新均值时,1-m乘以过去的均值再加上m乘以当前批次的均值

BN的优点与缺点

    • 可以尝试更大的学习率,加速神经网络的收敛

    • 对参数初始化不敏感

    • 缓解了梯度消失问题,让输出远离激活函数的饱和区

    • 还有正则化的作用

  • 缺点

    • 仅在batch中包含样本数量多的时候有效

    • 对RNN和序列数据性能较差

  • 9
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Batch normalization (BN) is a widely used technique in deep learning that helps to improve the training stability and speed by normalizing the input data to each layer. However, there have been some recent developments that go beyond batch normalization and aim to address some of its limitations. Here are a few examples: 1. Group normalization (GN): GN is an alternative to BN that replaces the batch dimension with a group dimension. Instead of computing the mean and variance over the batch dimension, GN computes them over the channel dimension for each group separately. GN has been shown to perform better than BN on small batch sizes and datasets with a large number of classes. 2. Layer normalization (LN): LN is a technique that normalizes the activations of each layer across the feature dimension. Unlike BN, LN does not depend on the batch size and can be applied to recurrent neural networks (RNNs) and other models that process sequences of variable length. 3. Instance normalization (IN): IN is a technique that normalizes the activations of each instance (e.g., image or sentence) across the channel dimension. IN has been shown to perform well on style transfer and other tasks that involve manipulating the appearance of an image. 4. Switchable normalization (SN): SN is a technique that combines different normalization methods (e.g., BN, GN, and LN) into a single trainable module. The module learns to switch between the different methods based on the input data and task requirements. These techniques represent some of the recent developments in normalization that go beyond batch normalization. While BN is still a useful technique, these alternatives provide additional flexibility and performance improvements in certain scenarios.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值