每日论文231021--DEEP COMPRESSION

最新推荐文章于 2024-01-29 22:58:11 发布

Undefined游侠

最新推荐文章于 2024-01-29 22:58:11 发布

阅读量33

点赞数

文章标签： 1024程序员节

本文链接：https://blog.csdn.net/qq_19859865/article/details/133957734

版权

论文名称：

DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING

论文链接：

https://arxiv.org/pdf/1803.03635.pdf

First, we prune the networking by removing the redundant connections, keeping only the most informative connections. Next, the weights are quantized so that multiple connections share the same weight, thus only the codebook (effective weights) and the indices need to be stored. Finally, we apply Huffman coding to take advantage of the biased distribution of effective weights.

本文介绍了模型压缩的主要流程的pipeline，分别是

pruning, trained quantization and Huffman coding。

Pruning

关于Pruning的使用，其实还是比较简单的，难的是，如何利用把pruned model 用于推理。这里，作者提到的用的是csc的参数存储方式。

这样，只需要存储稀疏的点，和点的坐标即可。

TRAINED QUANTIZATION AND WEIGHT SHARING

这一部分包含两个概念，量化和权值共享。

We limit the number of effective weights we need to store by having multiple connections share the same weight, and then fine-tune those shared weights

Weight sharing

We use k-means clustering to identify the shared weights for each layer of a trained network, so that all the weights that fall into the same cluster will share the same weight.

这里权值共享的方式是通过KMeans对于weights聚类,然后. 选择聚类中心代表权值.

权值初始化

权值初始化的方式有多种选择,作者在这里经过研究发现,linear quantization效果最好,.原因是范数较大的权值扮演着很重要的角色,但同时,它们的数量很少, 而其它的方法没有考虑到大权值的影响.

前向传播和反向传播

和常规的传播对比起来, 需要计算kmeans中心所对应的梯度变换, 加入其中.