Batch normal and scale in caffe

最新推荐文章于 2022-10-14 08:45:00 发布

月球上的人duo

最新推荐文章于 2022-10-14 08:45:00 发布

阅读量2.9k

点赞数

分类专栏：深度学习 Caffe 文章标签： caffe cnn batchNormalizaiton

本文链接：https://blog.csdn.net/qq_39560336/article/details/83027140

版权

8 篇文章 0 订阅

订阅专栏

4 篇文章 0 订阅

订阅专栏

I notice that BN layer used in mobile net.
something useful:

Conclution

Batch normalization can avoid Gradient explosion.
because the Caffe BatchNorm layer has no learnable parameters (still have problem)

Supplement（填坑）

What’s batch normalization Layer?.

BN layer look like this, every layer minus expectation and divide the variable, and we call it batch normalization.
Not only normalize. Indeed, the bn looks like this:

After normalize, follows scale and shift operations. So we always add scale layer after BN layer in caffe. The reason why we need scale layer may be I’ll explain next time.
A interesting phenomenon need to notice：
u + (b + db) − E[u + (b + db)] = u + b − E[u + b].
That’s no matter how you update the bias, output remain unchange.
Why choose Batch normalization?
The answer is it does works. Many research report that BN layer help accelerate training and can increase performance. In original bn paper, the author reported a performance increase in Imagenet.
Why BN works?
It is hard to answer. In orignal BN paper, the author argued the bn layer can exhibit less internal covariate shift. But some recent research show that BN layer can not alleviate internal covariate shift.