二值网络训练--A Empirical Study of Binary Neural Networks' Optimisation

最新推荐文章于 2025-05-10 22:49:16 发布

O天涯海阁O

最新推荐文章于 2025-05-10 22:49:16 发布

阅读量902

点赞数 1

分类专栏：模型优化加速 CNN网络模型压缩和量化

本文链接：https://blog.csdn.net/zhangjunhit/article/details/90409501

版权

CNN网络模型压缩和量化同时被 2 个专栏收录

25 篇文章

订阅专栏

模型优化加速

24 篇文章

订阅专栏

A Empirical Study of Binary Neural Networks’ Optimisation
ICLR2019
https://github.com/mi-lad/studying-binary-neural-networks

本文得到的几个结论如下：

ADAM for optimising the objective, (2) not using early stopping, (3) splitting the training into two stages, (4) removing gradient and weight clipping in the first stage and (5) reducing the averaging rate in Batch Normalisation layers in the second stage

在二值网络训练的时候有的用到了下面两个裁剪：
Gradient clipping 梯度裁剪梯度超过一定范围就丢弃
Weight clipping 权重裁剪让权重值保持在一定范围

forward path (and at the end of the training):
在这里插入图片描述
STE with gradient clipping provides an estimate for the gradient of this operation:

上图（a）中的二值卷积核实怎么得到的？二值卷积核是通过对 full-precision proxy 进行二值化（sign函数）得到，对应右图前向。那么这个 full-precision proxy 又是怎么来的了？通过 STE estimator 学习得到的，对应右图反向

在这里插入图片描述
3.1 Impact of Optimiser 优化器的影响

A possible hypothesis is that early stages of training binary models require more averaging for the optimiser to proceed in presence of binarisaton operation. On the other hand, in the late stages of the training, we rely on noisier sources to increase exploration power of the optimiser.

总体上来说 ADAM 更有优势

3.2 Impact of gradient and weight clipping
梯度裁剪和权重裁剪对于二值网络的精度影响不是很大，对于训练网络收敛速度有一定影响
在这里插入图片描述

the well-known observation that training a binary model is often notably slower than its non-binary counter-part

The slow down is mainly caused by the commonly applied gradient and weight clipping, as they keep parameters within the
{-1,1} range at all times during training

weight and gradient clipping help achieve better accuracy

We tested this hypothesis by training a binary model in two stages: (1) using vanilla STE in the first stage with higher learning rates and (2) turning clippings back on when the accuracy stops improving by reducing learning rate.
在这里插入图片描述

在这里插入图片描述