bn层中训练和测试改写

Detailed Description about BatchNormLayer

“Normalizes the input to have 0-mean and/or unit (1) variance across the batch.

This layer computes Batch Normalization as described in [1]. For each channel in the data (i.e. axis 1), it subtracts the mean and divides by the variance, where both statistics are computed across both spatial dimensions and across the different examples in the batch.

By default, during training time, the network is computing global mean/variance statistics via a running average, which is then used at test time to allow deterministic outputs for each input. You can manually toggle whether the network is accumulating or using the statistics via the use_global_stats option. For reference, these statistics are kept in the layer’s three blobs: (0) mean, (1) variance, and (2) moving average factor.”

遇见的问题

用的普通卷积层堆叠的网络,不加BN层时训练已经有效果并且运行正常,当加上BN层后,出现loss一直为87.3365不收敛的情况。

加入BN层后loss为87.3365的解决办法

BN层中有一个参数use_ global_stats,在训练时我们需要将其设置为false,这样BN层才能更新计算均值和方差,如果设置为true的话,就是初始固定的了,不会更新。在测试时,需要将其设置为true。将网络中该参数修改过来就训练正常了。

其他可能导致不收敛的问题(如loss为87.3365,loss居高不下等)解决方案

  1. 可以在solver里面设置:debug_info: true,看看各个层的data和diff是什么值,一般这个时候那些值不是NAN(无效数字)就是INF(无穷大);
  2. 检查数据标签是否从0开始并且连续;
  3. 把学习率base_lr调低;
  4. 数据问题,lmdb生成有误;
  5. 中间层没有归一化,导致经过几层后,输出的值已经很小了,这个时候再计算梯度就比较尴尬了,可以尝试在各个卷积层后加入BN层和SCALE层;
  6. 把base_lr调低,然后batchsize也调高;
  7. 把data层的输入图片进行归一化,就是从0-255归一化到0-1,使用的参数是:
 transform_param {  
    scale: 0.00390625//像素归一化,1/255
  } 
  • 1
  • 2
  • 3
  1. 网络参数太多,网络太深,删掉几层看看,可能因为数据少,需要减少中间层的num_output;
  2. 记得要shuffle数据,否则数据不够随机,几个batch之间的数据差异很小。
展开阅读全文

没有更多推荐了,返回首页