论文名称:
DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING
论文链接:
https://arxiv.org/pdf/1803.03635.pdf
First, we prune the networking by removing the redundant connections, keeping only the most informative connections. Next, the weights are quantized so that multiple connections share the same weight, thus only the codebook (effective weights) and the indices need to be stored. Finally, we apply Huffman coding to take advantage of the biased distribution of effective weights.
本文介绍了模型压缩的主要流程的pipeline,分别是
pruning, trained quantization and Huffman coding。
Pruning
关于Pruning的使用,其实还是比较简单的,难的是,如何利用把pruned model 用于推理。这里,作者提到的用的是csc的参数存储方式。
这样,只需要存储稀疏的点,和点的坐标即可。
TRAINED QUANTIZATION AND WEIGHT SHARING
这一部分包含两个概念,量化和权值共享。
We limit the number of effective weights we need to store by having multiple connections share the same weight, and then fine-tune those shared weights
Weight sharing
We use k-means clustering to identify the shared weights for each layer of a trained network, so that all the weights that fall into the same cluster will share the same weight.
这里权值共享的方式是通过KMeans对于weights聚类,然后. 选择聚类中心代表权值.
权值初始化
权值初始化的方式有多种选择,作者在这里经过研究发现,linear quantization效果最好,.原因是范数较大的权值扮演着很重要的角色,但同时,它们的数量很少, 而其它的方法没有考虑到大权值的影响.
前向传播和反向传播
和常规的传播对比起来, 需要计算kmeans中心所对应的梯度变换, 加入其中.
HUFFMAN CODING
huffman coding是无损编码的一种方式。基于occurance difference,它使用不同长度的codeword去编码不同的source symbol。但是,由于目前难以implementaio,其实使用的并不广泛。