每日论文230929--Like What You Like

Undefined游侠

已于 2023-09-30 13:34:08 修改

阅读量76

点赞数

分类专栏：论文阅读文章标签：人工智能机器学习深度学习

于 2023-09-30 13:31:35 首次发布

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/qq_19859865/article/details/133420562

版权

论文阅读专栏收录该内容

9 篇文章

订阅专栏

论文链接：https://arxiv.org/pdf/1707.01219.pdf

模型加速的三种方式：

* Network pruning 剪枝

Network pruning iteratively prunes the neurons or weights of low importance based on certain criteria,

* Network quantization 量化

Network quantization tries to reduce the precision of the weights or features.

* knowledge transfer 知识迁移/蒸馏

KT based methods directly train a smaller student network, which accelerates the original networks in terms of wall time without bells and whistles. The basic idea of KD is to distill knowledge from a large teacher model into a small one by learning the class distributions provided by the teacher via softened softmax.

主要概念

Maximum Mean Discrepancy

作者引入了MMD的概念，主要是用于表达两个分布之间的距离。

a distance metric for probability distributions based on the data samples sampled from them

Neuron Selectivity Transfer

NST，是作者在这篇论文里主要介绍的方法

动机

我们要模仿teacher model特征层的输出。

为什么不直接对比teacher model 和 student model的输出呢？作者给出的解释如下：

As for distribution matching, it is not a good choice to directly match the samples from it, since it ignores the sample density in the space.

所以，我们要使用 advanced distribution alignment method 匹配。

Loss定义

作者定义的MMD loss 如下：

而其中，kernel函数的选择有如下方式

实验

目标分类

从在ImageNet数据的实验中可以看到，单独使用一个学习方式时，KD得到了最好的方式；

而KD+NST在两种方法结合的选项中，取得了最好的效果。

目标检测

而在目标检测任务中，KD+NST方法在PASCAL VOC 2007中取得了更好的效果。

参考链接

如何评价图森科技连发的三篇关于深度模型压缩的文章？ - 知乎

Youtube上找到了MIT的一个tinyml的课程，很有意思

https://www.youtube.com/playlist?list=PL80kAHvQbh-ocildRaxjjBy6MR1ZsNCU7

fPublications - MIT HAN Lab

Undefined游侠

博客等级

码龄11年

91
原创

527
点赞

530
收藏

394
粉丝

关注

私信

热门文章

分类专栏

最新评论

Resnet20代码review
Sarah_Wang2: 您好，想请问下代码中的_make_layer是怎样的呀，谢谢
数据结构与算法2 哈希表
CSDN-Ada助手: 算法技能树或许可以帮到你：https://edu.csdn.net/skill/algorithm?utm_source=AI_act_algorithm
EfficientAI Lab：大模型AWQ量化
weixin_46656063: 为什么做这个Scale操作呢？其实是为了减少量化损失，对于普通的权重量化，损失一般在于Round操作的舍入误差，一般浮点数的舍入值在0~0.5，平均误差就是0.25。而先scale再量化的公式如下，一般来说在对应的salient weight row乘上因子s并不会影响weight的极值，那么，而Round误差一般也是不变的，那么下式的Err相比于原先的Err会多出一个1/s，那么量化误差就变低。
Swin-Transformer论文阅读
CSDN-Ada助手: 你好，CSDN 开始提供 #论文阅读# 的列表服务了。请看：https://blog.csdn.net/nav/advanced-technology/paper-reading?utm_source=csdn_ai_ada_blog_reply 。如果你有更多需求，请来这里 https://gitcode.net/csdn/csdn-tags/-/issues/34?utm_source=csdn_ai_ada_blog_reply 给我们提。
VIT论文阅读： A Image is Worth 16x16 Words
CSDN-Ada助手: 你好，CSDN 开始提供 #论文阅读# 的列表服务了。请看：https://blog.csdn.net/nav/advanced-technology/paper-reading?utm_source=csdn_ai_ada_blog_reply 。如果你有更多需求，请来这里 https://gitcode.net/csdn/csdn-tags/-/issues/34?utm_source=csdn_ai_ada_blog_reply 给我们提。

大家在看

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。