[剪枝]To prune, or not to prune: exploring the efficacy of pruning for model compression

最新推荐文章于 2023-08-15 11:15:48 发布

onion_rain

最新推荐文章于 2023-08-15 11:15:48 发布

阅读量438

点赞数 1

分类专栏：模型压缩

本文链接：https://blog.csdn.net/onion_rain/article/details/112795686

版权

模型压缩专栏收录该内容

5 篇文章 0 订阅

订阅专栏

To prune, or not to prune: exploring the efficacy of pruning for model compression

1 Introduction

given a bound on the model’s memory footprint, how can we arrive at the most accurate model?

作者对比了两种等价的模型：

(1) large-sparse
(2) small-dense

2 Related work

早期有LeCun的OBD(optimal brain damage)等
近期都是权重剪枝，我们基于此提出了AGP(automate gradual pruning)
还有结构化剪枝，这种可以加速推理，但它may not be directly extensible to other nn architectures
在其他还有量化、 low-rank matrix factorization 、group sparsity regularization等

3 Methods

            首先预训练$t_0$个step，然后每$\Delta t$ 个step，更新一次binary weight masks，并prune，同时计算新的sparsity value：$S_f$，共迭代n*$\Delta t$个step，$\Delta t$通常设置为100-1000，达到目标$s_f$，masks不再更新

            第t个step的sparsity value如下公式：

$s_t = s_f + (s_i - s_f) (1 - \frac{t - t_0}{n\Delta t})^3 \} \quad for\quad t\in \{t_0,\ t_0+\Delta t,\ ...,\ t_0+n\Delta t \} \qquad\qquad (1)$

$s_i$ 是initial sparsity value(usually 0)
$s_f$ 是final sparsity value
$s_t$ 是当前sparsity value
$t$ 是一共持续多少轮prune（单位： $\Delta t$ 个step）
$\Delta t$ 是pruning frequency（单位：step）
$t_0$ 是start training step（单位：step）

t就是当前training step（单位：step）

          换种表示方法：设$t = t_0 + a * \Delta t$，那么公式(1)可替换为：

$s_t = s_f + (s_i - s_f) (1 - \frac{a}{n})^3 \} \quad for\quad a\in \{0,\ 1,\ ...,\ n-1,\ n \} \qquad\qquad (2)$
而且通常情况下Si=0，那么公式(1)还可以继续简化为：

$s_t = s_f \{1 - (1 - \frac{a}{n})^3 \} \quad for\quad a\in \{0,\ 1, ...,\ n-1,\ n \}\qquad\qquad\qquad(3)$
等式中稀疏函数的作用是在初始阶段当冗余连接充足时快速修剪网络，随着网络中剩余的权重越来越少并逐渐减少每次修剪的权重数量