二值网络训练--A Empirical Study of Binary Neural Networks' Optimisation

A Empirical Study of Binary Neural Networks’ Optimisation
ICLR2019
https://github.com/mi-lad/studying-binary-neural-networks

本文得到的几个结论如下:

  1. ADAM for optimising the objective, (2) not using early stopping, (3) splitting the training into two stages, (4) removing gradient and weight clipping in the first stage and (5) reducing the averaging rate in Batch Normalisation layers in the second stage

在二值网络训练的时候有的用到了下面两个裁剪:
Gradient clipping 梯度裁剪 梯度超过一定范围就丢弃
Weight clipping 权重裁剪 让权重值保持在一定范围

forward path (and at the end of the training):
在这里插入图片描述
STE with gradient clipping provides an estimate for the gradient of this operation:
在这里插入图片描述
在这里插入图片描述
上图(a)中的二值卷积核实怎么得到的?二值卷积核 是通过 对 full-precision proxy 进行 二值化(sign函数)得到,对应右图前向。那么这个 full-precision proxy 又是怎么来的了? 通过 STE estimator 学习得到的,对应右图反向

在这里插入图片描述
3.1 Impact of Optimiser 优化器的影响
在这里插入图片描述
A possible hypothesis is that early stages of training binary models require more averaging for the optimiser to proceed in presence of binarisaton operation. On the other hand, in the late stages of the training, we rely on noisier sources to increase exploration power of the optimiser.

总体上来说 ADAM 更有优势

3.2 Impact of gradient and weight clipping
梯度裁剪和权重裁剪对于二值网络的精度影响不是很大,对于训练网络收敛速度有一定影响
在这里插入图片描述

the well-known observation that training a binary model is often notably slower than its non-binary counter-part

The slow down is mainly caused by the commonly applied gradient and weight clipping, as they keep parameters within the
{-1,1} range at all times during training

weight and gradient clipping help achieve better accuracy

We tested this hypothesis by training a binary model in two stages: (1) using vanilla STE in the first stage with higher learning rates and (2) turning clippings back on when the accuracy stops improving by reducing learning rate.
在这里插入图片描述
在这里插入图片描述

在这里插入图片描述
在这里插入图片描述

11

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值