MIT-TinyML学习笔记【3】Pruning续

Determine the Pruning Ratio

What should target sparsity be for each layer?

Section 1: Pruning Ratio
How should we find per-layer pruning ratios?

deeper layer usually contain more redundancy and can be more aggressively pruned

Sensitivity Scan

image.png
在lab1中已经进行过sensitivity scan
对每一层,采用不同的sparsity,然后绘制图像,有助于手工设置不同层不同的压缩率
image.png

绘制在一张图像上更加直观

独立分析每层的sensitivity忽略了每层权重之间的相关性!
We do not consider the interaction between layers.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7GgL0sKT-1689304770062)(https://obsidian-image.oss-cn-shanghai.aliyuncs.com/202306262135815.png)]

可以通过手工设置一个阈值,来得到不同layer的Pruning Rate
(实际上工业界很常用这种方法,Intel Nvidia,robust and easy to do!)
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tATNVIRv-1689304770062)(https://obsidian-image.oss-cn-shanghai.aliyuncs.com/202306262137505.png)]

回忆Lab1中的实验,绘制了不同层的参数量直方图

  • 除了层与层之间的相关性,上述的方法还忽略了每一层的size!
  • 假设green layer 的size很小,那么Pruning Rate很高对于整体参数的压缩也起不到什么大的作用

所以decision making变得非常复杂,自动化的方法就非常有必要性了

AutoML(AMC)

Reinforcement Learning Agent
[[Reference_AMC- AutoML for Model Compression and Acceleration on Mobile Devices]]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vSYvJnjm-1689304770062)(https://obsidian-image.oss-cn-shanghai.aliyuncs.com/202306262149552.png)]

  • Critic: 直观来讲是error,然而这其实是远远不够的,我们还希望惩罚long latency 、 flops、model size(you can add different terms in a reward function)
  • Actor:e.g. the sparsity ratio for each layer
  • Embedding:N, C, H, W and index of the layer as features to the agent

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-hyu2UrkR-1689304770063)(https://obsidian-image.oss-cn-shanghai.aliyuncs.com/202306262207137.png)]

  • Reward:model size constrains
  • 这个pre-built的查找表是什么东西?感觉挺有意思

image.png
29%是Dr. Han PhD 时候手工调一周的结果,而AMC只需要GPU跑几个小时

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fv1wsnfU-1689304770063)(https://obsidian-image.oss-cn-shanghai.aliyuncs.com/202306262215841.png)]

  • different stages means different resolution in ResNet-50
    • 1x1 conv will be pruned less
    • 3x3 conv will be pruned more

AMC性能的优越性
image.png

NetAdapt

[[Reference_NetAdapt- Platform-Aware Neural Network Adaptation for Mobile Applications]]

A rule-based iterative/progressive method

  • The goal of NetAdapt is to find a per-layer pruning ratio to meet a global resource constraint (e.g.,latency, energy, …)
    • The process is done iteratively
    • We take latency constraint as an example
    image.png

  • 给定一个latency的减少量,然后对每层进行剪枝使其剪枝后符合latency减少量的设定

  • fine-tune后计算每层的accuracy,找到精度损失最小的一层,剪枝

  • 给定下一个latency的减少量,重复操作,直到满足latency的要求

image.png

Fine-tune/Train Pruned Neural Network

How should we improve performance of pruned models?

Learning rate for fine-tuning is usually 1/100 or 1/10 of the original learning rate

Iterative Pruning

Do not pruning the model directly to the target sparsity

  • Consider pruning followed by a fine-tuning is one iteration.

  • Iterative pruning gradually increases the target sparsity in each iteration.

  • boost pruning ratio from 5✕ to 9✕ on AlexNet compared to single-step aggressive pruning.
    [[Reference_Learning Both Weights and Connections for Efficient Neural Network]]
    image.png

  • 假设sparsity是90%,先剪枝30%+fine tuning,再剪枝到70%+fine tuning …

Regularization

During the fine-tuning, we want to add regularization to encourage the weights to be closer to zero, so that it will be easier to prune them

[[Reference_Learning Efficient Convolutional Networks through Network Slimming]]
[[Reference_Learning Both Weights and Connections for Efficient Neural Network]]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WWJvKD8u-1689304770065)(https://obsidian-image.oss-cn-shanghai.aliyuncs.com/202306262303920.png)]
[[ADMM求解Pruning Problem]]

Lottery Ticket Hypothesis

Can we train a sparse neural network from scratch?

image.png

[[Reference_The Lottery Ticket Hypothesis- Finding Sparse, Trainable Neural Networks]]

image.png

  • dense neural network -> train -> prune (get win tickets, aka the sparsity pattern)
  • get a new sparsity neural network by using that pattern
  • train it and then we prune it more
    image.png
  • get a new pattern (a more sparsity one)
  • train it and get the same accuracy!
    image.png

注意:这样干 我们最开始还是要train一个dense network
并且,有模型规模的限制

[[Reference_Stabilizing the Lottery Ticket Hypothesis]]

  • 对于更大的模型而言,我们不能拿到pattern之后直接应用得到sparsity network然后从头开始训练(随机初始化的weights), W t = 0 W_{t=0}

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值