MIT-TinyML学习笔记【3】Pruning续

最新推荐文章于 2024-06-12 22:18:06 发布

dayelang.

最新推荐文章于 2024-06-12 22:18:06 发布

阅读量88

点赞数

文章标签：学习笔记剪枝深度学习

本文链接：https://blog.csdn.net/qq_41977060/article/details/131719643

版权

Determine the Pruning Ratio

What should target sparsity be for each layer?

Section 1: Pruning Ratio
How should we find per-layer pruning ratios?

deeper layer usually contain more redundancy and can be more aggressively pruned

Sensitivity Scan

在lab1中已经进行过sensitivity scan
对每一层，采用不同的sparsity，然后绘制图像，有助于手工设置不同层不同的压缩率

绘制在一张图像上更加直观

独立分析每层的sensitivity忽略了每层权重之间的相关性！
We do not consider the interaction between layers.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7GgL0sKT-1689304770062)(https://obsidian-image.oss-cn-shanghai.aliyuncs.com/202306262135815.png)]

可以通过手工设置一个阈值，来得到不同layer的Pruning Rate
（实际上工业界很常用这种方法，Intel Nvidia，robust and easy to do！）
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tATNVIRv-1689304770062)(https://obsidian-image.oss-cn-shanghai.aliyuncs.com/202306262137505.png)]

回忆Lab1中的实验，绘制了不同层的参数量直方图

除了层与层之间的相关性，上述的方法还忽略了每一层的size！

假设green layer 的size很小，那么Pruning Rate很高对于整体参数的压缩也起不到什么大的作用

所以decision making变得非常复杂，自动化的方法就非常有必要性了

AutoML(AMC)

Reinforcement Learning Agent
[[Reference_AMC- AutoML for Model Compression and Acceleration on Mobile Devices]]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vSYvJnjm-1689304770062)(https://obsidian-image.oss-cn-shanghai.aliyuncs.com/202306262149552.png)]

Critic: 直观来讲是error，然而这其实是远远不够的，我们还希望惩罚long latency 、 flops、model size（you can add different terms in a reward function）
Actor：e.g. the sparsity ratio for each layer
Embedding：N, C, H, W and index of the layer as features to the agent

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-hyu2UrkR-1689304770063)(https://obsidian-image.oss-cn-shanghai.aliyuncs.com/202306262207137.png)]

Reward：model size constrains
这个pre-built的查找表是什么东西？感觉挺有意思

29%是Dr. Han PhD 时候手工调一周的结果，而AMC只需要GPU跑几个小时

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fv1wsnfU-1689304770063)(https://obsidian-image.oss-cn-shanghai.aliyuncs.com/202306262215841.png)]

different stages means different resolution in ResNet-50
- 1x1 conv will be pruned less
- 3x3 conv will be pruned more

AMC性能的优越性

NetAdapt

[[Reference_NetAdapt- Platform-Aware Neural Network Adaptation for Mobile Applications]]

A rule-based iterative/progressive method

The goal of NetAdapt is to find a per-layer pruning ratio to meet a global resource constraint (e.g.,latency, energy, …)
• The process is done iteratively
• We take latency constraint as an example
给定一个latency的减少量，然后对每层进行剪枝使其剪枝后符合latency减少量的设定
fine-tune后计算每层的accuracy，找到精度损失最小的一层，剪枝
给定下一个latency的减少量，重复操作，直到满足latency的要求

Fine-tune/Train Pruned Neural Network

How should we improve performance of pruned models?

Learning rate for fine-tuning is usually 1/100 or 1/10 of the original learning rate

Iterative Pruning

Do not pruning the model directly to the target sparsity

Consider pruning followed by a fine-tuning is one iteration.
Iterative pruning gradually increases the target sparsity in each iteration.
boost pruning ratio from 5✕ to 9✕ on AlexNet compared to single-step aggressive pruning.
[[Reference_Learning Both Weights and Connections for Efficient Neural Network]]
假设sparsity是90%，先剪枝30%+fine tuning，再剪枝到70%+fine tuning …

Regularization

During the fine-tuning, we want to add regularization to encourage the weights to be closer to zero, so that it will be easier to prune them

[[Reference_Learning Efficient Convolutional Networks through Network Slimming]]
[[Reference_Learning Both Weights and Connections for Efficient Neural Network]]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WWJvKD8u-1689304770065)(https://obsidian-image.oss-cn-shanghai.aliyuncs.com/202306262303920.png)]
[[ADMM求解Pruning Problem]]

Lottery Ticket Hypothesis

Can we train a sparse neural network from scratch?

[[Reference_The Lottery Ticket Hypothesis- Finding Sparse, Trainable Neural Networks]]

dense neural network -> train -> prune (get win tickets, aka the sparsity pattern)
get a new sparsity neural network by using that pattern
train it and then we prune it more
get a new pattern (a more sparsity one)
train it and get the same accuracy！

注意：这样干我们最开始还是要train一个dense network
并且，有模型规模的限制

[[Reference_Stabilizing the Lottery Ticket Hypothesis]]

对于更大的模型而言，我们不能拿到pattern之后直接应用得到sparsity network然后从头开始训练（随机初始化的weights）， $W_{t=0}$

最低0.47元/天解锁文章

dayelang.

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
MIT-TinyML学习笔记【3】Pruning续

同样的RTL代码没有变化，同样的硬件处理流程，但是根据硬件部署的特性返回去再对算法进一步优化，使得在精度几乎没有什么损失的情况下，速度进一步提升。对于更大的模型而言，我们不能拿到pattern之后直接应用得到sparsity network然后从头开始训练（随机初始化的weights），重新排列通道的处理顺序，使得每个group中不同的线程处理相同数量的卷积核计算，重新排列后分别为22，33，44。对每一层，采用不同的sparsity，然后绘制图像，有助于手工设置不同层不同的压缩率。
复制链接

扫一扫