【合集】CV中的各种Normlization汇总_各个normlization的表达式-CSDN博客

本文链接：https://blog.csdn.net/wangdongwei0/article/details/110873999

Batch Normalization

Batch Group Normalization

（此图出自BGN论文）

Batch Normalization

论文：《Batch Normalization: Accelerating Deep Network Training b y Reducing Internal Covariate Shift》 2015年

链接： https://arxiv.org/pdf/1502.03167.pdf

BatchNorm2d 是最常见的归一化方法，对于输入格式为NHWC的输入，在每个mini-batch中对每个channel计算均值( E[x] ）和方差( Var[x] )，γ和β 是可训练的参数，是Vector类型，Vector长度为C。

目的：解决Internal Covariate Shift（ICS）问题，作用于CNN的forward过程，加速模型收敛速度和效果。

ICS产生原因：深度神经网络涉及到很多层的叠加，而每一层的参数更新会导致上层的输入数据分布发生变化，通过层层叠加，高层的输入分布变化会非常剧烈，这就使得高层需要不断去重新适应底层的参数更新。

实现方法：https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html?highlight=norm#torch.nn.BatchNorm2d

需要注意的是，也有 momentum 参数，是在计算 running_mean 和 running_var的时候使用的，具体的计算方法和优化器里使用的momentum是不同的，计算方法如下：

注：优化器里的momentum实现方法可以看下pytorch的SGD的文档：https://pytorch.org/docs/stable/optim.html?highlight=momentum

缺点：batch size过小会使得BN失效。（cannot be applied to online learning tasks or to extremely large distributed models where the minibatches have to be small）

LayerNorm

论文：《 Layer Normalization 》 2016年

链接：https://arxiv.org/pdf/1607.06450.pdf

由于BN效果受batch size大小影响，而且不能应用于RNN(因为RNN的输入长度是变化的)，所以本文提出了一个LN的方法。不同于batch size是对一个mini batch内的不同image的相同通道做Normalization，LN是取相同image的不同通道做Normalization，这样Normalization的结果就和batch size大小无关了。

实现方法：https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html#torch.nn.LayerNorm

注：从论文里看，LN的提出更像是针对RNN专门设计的一个Norm方法。因为BN不适用于RNN。