关于mxnet的for_training 和is_train

  • forward(is_train):
    is_train was not related to memory saving, but will only affect some runtime behavior of operators. Currently it is mainly for things like dropout and batchnorm

    • During testing, dropout does not change the input.
      random.seed(998)
      input_array = array([[3., 0.5,  -0.5,  2., 7.],
                          [2., -0.4,   7.,  3., 0.2]])
      a = symbol.Variable('a')
      dropout = symbol.Dropout(a, p = 0.2)
      executor = dropout.simple_bind(a = input_array.shape)
      
      
      ## If training
      
      executor.forward(is_train = True, a = input_array)
      executor.outputs
      [[ 3.75   0.625 -0.     2.5    8.75 ]
       [ 2.5   -0.5    8.75   3.75   0.   ]]
      
      
      ## If testing
      
      executor.forward(is_train = False, a = input_array)
      executor.outputs
      [[ 3.     0.5   -0.5    2.     7.   ]
       [ 2.    -0.4    7.     3.     0.2  ]]
  • save model

    model.save_checkpoint('unet', num_epoch, save_optimizer_states=False)
  • load model

    unet = mx.mod.Module.load('unet', 0)
    unet.bind(for_training=False, data_shapes=[['data', (6,1,64,64)]])
    • bind(for_training):for_training会影响是否计算输入的梯度,而且forward()中的is_train=for_training。
  • BatchNorm

    • forward()中的is_train会影响batchnorm的计算
      • is_train = true:在训练阶段. 不使用use global statistics. 即在训练阶段不使用use_global_stats, 否则网络不能收敛. 训练阶段基于mini-batch做BN处理, 针对当前 mini-batch 计算期望和方差.
      • is_train=false:在测试阶段,采用的技术是使用moving average算法, 在为此在训练过程中需要记录每一个Batch的均值和方差, 以便训练完成之后按照下式计算整体的均值和方差。
    if (ctx.is_train && !param_.use_global_stats) {
          Tensor<xpu, 1> mean = out_data[batchnorm_v1::kMean].get<xpu, 1, real_t>(s);
          Tensor<xpu, 1> var = out_data[batchnorm_v1::kVar].get<xpu, 1, real_t>(s);
          CHECK(req[batchnorm_v1::kMean] == kNullOp || req[batchnorm_v1::kMean] == kWriteTo);
          CHECK(req[batchnorm_v1::kVar] == kNullOp || req[batchnorm_v1::kVar] == kWriteTo);
          // The first three steps must be enforced.
          mean = scale * sumall_except_dim<1>(data);
          var = scale * sumall_except_dim<1>(F<mshadow_op::square>(
              data - broadcast<1>(mean, data.shape_)));
          Assign(out, req[batchnorm_v1::kOut], broadcast<1>(slope, out.shape_) *
                 (data - broadcast<1>(mean, data.shape_)) /
                 F<mshadow_op::square_root>(broadcast<1>(var + param_.eps, data.shape_)) +
                 broadcast<1>(bias, out.shape_));
        } else {
          Assign(out, req[batchnorm_v1::kOut], broadcast<1>(slope /
                                              F<mshadow_op::square_root>(moving_var + param_.eps),
                                              data.shape_) * data +
                 broadcast<1>(bias - (slope * moving_mean) /
                              F<mshadow_op::square_root>(moving_var + param_.eps), data.shape_));
          // Set mean and var tensors to their moving values
          Tensor<xpu, 1> mean = out_data[batchnorm_v1::kMean].get<xpu, 1, real_t>(s);
          Tensor<xpu, 1> var = out_data[batchnorm_v1::kVar].get<xpu, 1, real_t>(s);
          mean = F<mshadow_op::identity>(moving_mean);
          var  = F<mshadow_op::identity>(moving_var);
        }
      }

小结

  • bind(for_training) -> forward(is_train)->batchnorm()

参考

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值