Batch normal and scale in caffe

I notice that BN layer used in mobile net.
something useful:

Conclution

  • Batch normalization can avoid Gradient explosion.
  • because the Caffe BatchNorm layer has no learnable parameters (still have problem)

Supplement(填坑)

  • After reading some paper, I learned something new:
  1. What’s batch normalization Layer?.
    在这里插入图片描述
    BN layer look like this, every layer minus expectation and divide the variable, and we call it batch normalization.

  2. Not only normalize. Indeed, the bn looks like this:
    在这里插入图片描述
    After normalize, follows scale and shift operations. So we always add scale layer after BN layer in caffe. The reason why we need scale layer may be I’ll explain next time.

  3. A interesting phenomenon need to notice:
    u + (b + db) − E[u + (b + db)] = u + b − E[u + b].
    That’s no matter how you update the bias, output remain unchange.

  4. Why choose Batch normalization?
    The answer is it does works. Many research report that BN layer help accelerate training and can increase performance. In original bn paper, the author reported a performance increase in Imagenet.

  5. Why BN works?
    It is hard to answer. In orignal BN paper, the author argued the bn layer can exhibit less internal covariate shift. But some recent research show that BN layer can not alleviate internal covariate shift.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值