【GN】《Group Normalization》

最新推荐文章于 2024-08-31 23:13:29 发布

bryant_meng

最新推荐文章于 2024-08-31 23:13:29 发布

阅读量624

点赞数 28

分类专栏： CNN / Transformer 文章标签：人工智能深度学习 Group Norm GN BN

本文链接：https://blog.csdn.net/bryant_meng/article/details/137338518

版权

CNN / Transformer 专栏收录该内容

210 篇文章 7 订阅

订阅专栏

文章介绍了GroupNormalization(GN)的提出，针对小批量训练中BatchNormalization(BN)效果下降的问题。GN在不同任务如ImageNet分类、COCO对象检测和视频分类中展示了优势，尤其在小批量和深度模型上表现优于传统方法。

摘要由CSDN通过智能技术生成

在这里插入图片描述
ECCV-2018
Facebook AI Research
更多论文解读，可参考【Paper Reading】

1 Background and Motivation

在这里插入图片描述

Batch normalization（BN）在 batch size 很小的时候，效果下降的比较多，而目标检测或者分割等任务由于输入分辨率比较高，网络偏大时 batch-size 往往比较小，BN 发挥的作用减弱了

作者基于 many classical features like SIFT and HOG are group-wise features and involve group-wise normalization

提出了 Group Normalization，以此来减小小 batch-size 对 normalization 带来的影响

2 Related Work

在这里插入图片描述

Normalization
LRN / BN / LN / IN / WN（weight normalization）
LN 和 IN 属于 GN 的两个极端， effective for training sequential models (RNN/LSTM) or generative models(GAN)，but have limited success in visual recognition
Addressing small batches
Batch Renormalization（batch size 过小也不行）
Group-wise computation
AlexNet / ResNeXt / MobileNet / Xception / ShuffleNet

3 Advantages / Contributions

提出 Group Normalization

4 Method

its computation is independent of batch sizes.
在这里插入图片描述

LN, IN, and GN all perform independent computations along the batch axis

GN 的两个极端就是 LN 和 IN

看看公式表达，减均值，除以标准差
在这里插入图片描述
打一巴掌来个糖，学两个参数弥补回来

$i = (i_N, i_C,i_H,i_W)$

在这里插入图片描述
$S_i$ is the set of pixels in which the mean and std are computed, and $m$ is the size of this set.

$\epsilon$ 防止除以 0

BN，某通道下 NHW

在这里插入图片描述
LN，某 batch 下，CHW

IN，某通道，某 batch 下，HW

GN，某 batch 下，某组通道

$G$ is the number of groups，默认 32

tensorflow 代码
在这里插入图片描述

5 Experiments

5.1 Datasets and Metrics

ImageNet：top-1 classification error
COCO Detection：mAP
COCO Segmentation：mmAP
Kinetics： accuracy

5.2 Image Classification in ImageNet

（1）Comparison of feature normalization methods

在这里插入图片描述
bs = 32 的时候，train error GN 最低，但是 val error 没有 BN 好，说明泛化性能没有 BN 好

作者的解释

BN’s mean and variance computation introduces uncertainty caused by the stochastic batch sampling, which helps regularization

32 组不知道每组通道数为多少，如果 32 的话， normalization 的数量和 bs = 32 的 BN 是一样的了，区别一个为 batch 轴的 32，一个为 channel 轴的 32

在这里插入图片描述
bs = 32 的时候，没有BN 好

（2）Small batch sizes

在这里插入图片描述

bs 比较小的时候，GN 的优势发挥出来了，且 GN 对 bs 不敏感

优势，This will make it possible to train higher capacity models that would be otherwise bottlenecked by memory limitation

（3）Comparison with Batch Renorm (BR)

With a batch size of 4, ResNet-50 trained with BR has an error rate of 26.3%.

BN 27.3%

GN 24.2%

（4）Group division

在这里插入图片描述
对比了下 G 和 channel per group 的不同配置结果

（6）Deeper models

resnet101，32 bs 不如 BN，2 bs 比 BN 好

（7）Results and analysis of VGG models
在这里插入图片描述
conv5_3（the last convolutional layer）

normalization 还是比较重要的，GN 比 BN 效果更好

5.3 Object Detection and Segmentation in COCO

BS 比较小的任务上，属于 GN 的领域

（1）Results of C4 backbone
在这里插入图片描述
主干C4 特征图接分类回归分割头

（2）Results of FPN backbone
在这里插入图片描述
FPN 接分类回归分割头

long：iterations from 180k to 270k

（3）Training Mask R-CNN from scratch

在这里插入图片描述
对比 table6 的结果看，从头开始训练也是比 BN fine-tune 强的

5.4 Video Classification in Kinetics

在这里插入图片描述

6 Conclusion（own） / Future work

BN 的缺点 BN’s error increases rapidly when the batch size becomes smaller，原因 reducing the batch size can have dramatic impact on the estimated batch statistics
GN could be used in place of LN and IN and thus is applicable for sequential or generative models
BS 比较大的时候没有 BN 猛，BS 比较小的时候比 BN 猛