I notice that BN layer used in mobile net.
something useful:
- good explanation:
https://github.com/BVLC/caffe/issues/3347 - Book Deepleaning: http://www.deeplearningbook.org/contents/optimization.html
- Why BN layer always followed scale layer in caffe?
Answer here:
https://stackoverflow.com/questions/41351390/do-i-have-to-use-a-scale-layer-after-every-batchnorm-layer
Conclution
- Batch normalization can avoid Gradient explosion.
- because the Caffe BatchNorm layer has no learnable parameters (still have problem)
Supplement(填坑)
- After reading some paper, I learned something new:
-
What’s batch normalization Layer?.
BN layer look like this, every layer minus expectation and divide the variable, and we call it batch normalization. -
Not only normalize. Indeed, the bn looks like this:
After normalize, follows scale and shift operations. So we always add scale layer after BN layer in caffe. The reason why we need scale layer may be I’ll explain next time. -
A interesting phenomenon need to notice:
u + (b + db) − E[u + (b + db)] = u + b − E[u + b].
That’s no matter how you update the bias, output remain unchange. -
Why choose Batch normalization?
The answer is it does works. Many research report that BN layer help accelerate training and can increase performance. In original bn paper, the author reported a performance increase in Imagenet. -
Why BN works?
It is hard to answer. In orignal BN paper, the author argued the bn layer can exhibit less internal covariate shift. But some recent research show that BN layer can not alleviate internal covariate shift.