Determine the Pruning Ratio
What should target sparsity be for each layer?
Section 1: Pruning Ratio
How should we find per-layer pruning ratios?
deeper layer usually contain more redundancy and can be more aggressively pruned
Sensitivity Scan
在lab1中已经进行过sensitivity scan
对每一层,采用不同的sparsity,然后绘制图像,有助于手工设置不同层不同的压缩率
绘制在一张图像上更加直观
独立分析每层的sensitivity忽略了每层权重之间的相关性!
We do not consider the interaction between layers.
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7GgL0sKT-1689304770062)(https://obsidian-image.oss-cn-shanghai.aliyuncs.com/202306262135815.png)]
可以通过手工设置一个阈值,来得到不同layer的Pruning Rate
(实际上工业界很常用这种方法,Intel Nvidia,robust and easy to do!)
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tATNVIRv-1689304770062)(https://obsidian-image.oss-cn-shanghai.aliyuncs.com/202306262137505.png)]
回忆Lab1中的实验,绘制了不同层的参数量直方图
- 除了层与层之间的相关性,上述的方法还忽略了每一层的size!
- 假设green layer 的size很小,那么Pruning Rate很高对于整体参数的压缩也起不到什么大的作用
所以decision making变得非常复杂,自动化的方法就非常有必要性了
AutoML(AMC)
Reinforcement Learning Agent
[[Reference_AMC- AutoML for Model Compression and Acceleration on Mobile Devices]]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vSYvJnjm-1689304770062)(https://obsidian-image.oss-cn-shanghai.aliyuncs.com/202306262149552.png)]
- Critic: 直观来讲是error,然而这其实是远远不够的,我们还希望惩罚long latency 、 flops、model size(you can add different terms in a reward function)
- Actor:e.g. the sparsity ratio for each layer
- Embedding:N, C, H, W and index of the layer as features to the agent
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-hyu2UrkR-1689304770063)(https://obsidian-image.oss-cn-shanghai.aliyuncs.com/202306262207137.png)]
- Reward:model size constrains
- 这个pre-built的查找表是什么东西?感觉挺有意思
29%是Dr. Han PhD 时候手工调一周的结果,而AMC只需要GPU跑几个小时
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fv1wsnfU-1689304770063)(https://obsidian-image.oss-cn-shanghai.aliyuncs.com/202306262215841.png)]
- different stages means different resolution in ResNet-50
- 1x1 conv will be pruned less
- 3x3 conv will be pruned more
AMC性能的优越性
NetAdapt
[[Reference_NetAdapt- Platform-Aware Neural Network Adaptation for Mobile Applications]]
A rule-based iterative/progressive method
-
The goal of NetAdapt is to find a per-layer pruning ratio to meet a global resource constraint (e.g.,latency, energy, …)
• The process is done iteratively
• We take latency constraint as an example
-
给定一个latency的减少量,然后对每层进行剪枝使其剪枝后符合latency减少量的设定
-
fine-tune后计算每层的accuracy,找到精度损失最小的一层,剪枝
-
给定下一个latency的减少量,重复操作,直到满足latency的要求
Fine-tune/Train Pruned Neural Network
How should we improve performance of pruned models?
Learning rate for fine-tuning is usually 1/100 or 1/10 of the original learning rate
Iterative Pruning
Do not pruning the model directly to the target sparsity
-
Consider pruning followed by a fine-tuning is one iteration.
-
Iterative pruning gradually increases the target sparsity in each iteration.
-
boost pruning ratio from 5✕ to 9✕ on AlexNet compared to single-step aggressive pruning.
[[Reference_Learning Both Weights and Connections for Efficient Neural Network]]
-
假设sparsity是90%,先剪枝30%+fine tuning,再剪枝到70%+fine tuning …
Regularization
During the fine-tuning, we want to add regularization to encourage the weights to be closer to zero, so that it will be easier to prune them
[[Reference_Learning Efficient Convolutional Networks through Network Slimming]]
[[Reference_Learning Both Weights and Connections for Efficient Neural Network]]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WWJvKD8u-1689304770065)(https://obsidian-image.oss-cn-shanghai.aliyuncs.com/202306262303920.png)]
[[ADMM求解Pruning Problem]]
Lottery Ticket Hypothesis
Can we train a sparse neural network from scratch?
[[Reference_The Lottery Ticket Hypothesis- Finding Sparse, Trainable Neural Networks]]
- dense neural network -> train -> prune (get win tickets, aka the sparsity pattern)
- get a new sparsity neural network by using that pattern
- train it and then we prune it more
- get a new pattern (a more sparsity one)
- train it and get the same accuracy!
注意:这样干 我们最开始还是要train一个dense network
并且,有模型规模的限制
[[Reference_Stabilizing the Lottery Ticket Hypothesis]]
-
对于更大的模型而言,我们不能拿到pattern之后直接应用得到sparsity network然后从头开始训练(随机初始化的weights), W t = 0 W_{t=0}