模型压缩_模型压缩:

模型压缩

Whether you’re new to computer vision or an expert, you’ve probably heard about AlexNet winning the ImageNet challenge in 2012. That was the turning point in computer vision history because it showed that deep learning models can perform tasks which were considered very difficult for computers, with an unprecedented level of accuracy.

无论您是计算机视觉的新手还是专家,您都可能听说过AlexNet在2012年赢得ImageNet挑战。这是计算机视觉历史的转折点,因为它表明深度学习模型可以执行被认为非常困难的任务用于计算机,具有前所未有的准确性。

But did you know that AlexNet had 62 million trainable parameters?

但是您知道AlexNet具有6200万个可训练参数吗?

Interesting right.

有趣的权利。

Another popular model VGGNet which came out in 2014 had even more, 138 million trainable parameters.

2014年推出的另一个流行模型VGGNet具有更多的1.38亿个可训练参数。

That’s more than 2 times that of AlexNet.

这是AlexNet的2倍以上。

You might be thinking… I know that the deeper the model is, the better it will perform. So why are you highlighting the number of parameters? Deeper the network, it is obvious that there will be more parameters.

您可能在想……我知道模型越深入,它的性能就会越好。 那么,为什么要突出显示参数数量呢? 网络越深,显然会有更多的参数。

Image for post
#Parameters and #MACCs are in order of millions in the above table [8] 上表中的#Parameters和#MACC数百万个顺序 [8]

Sure, these deep models have been benchmarks in the computer vision industry. But when you want to create a real-world application, would you choose these models?

当然,这些深层模型已成为计算机视觉行业的基准。 但是,当您要创建实际应用程序时,是否会选择这些模型?

I guess the real question we should ask here is: CAN YOU USE THESE MODELS IN YOUR APPLICATION?

我猜我们应该在这里问的真正问题是: 您可以在应用程序中使用这些模型吗?

Hold that thought for just a minute!

保持一分钟的想法!

Let me divert here for a bit, before I get to the answer. (But feel free to skip to the end.)

在得到答案之前,让我转移一下。 (但是请随意跳到最后。)

The number of IoT devices is expected to reach 125–500 Billion by 2030 and assuming that 20% of them will have cameras, IoT devices with cameras is a 13–100 billion unit market. [9,10,11]

到2030年,物联网设备的数量预计将达到1250亿至5000亿,并且假设其中20%配备摄像头,那么带摄像头的物联网设备的市场规模将达到13至1000亿。 [9,10,11]

IoT camera devices include home security cameras (such as Amazon Ring and Google Nest) that open the door when you reach home or notify you if it sees an unknown person, cameras on smart vehicles that assist your driving, or cameras at a parking lot that open the gate when you enter or exit, just to name a few! Some of these IoT devices are already using AI to some extent and others are catching up slowly.

物联网摄像头设备包括家庭安全摄像头(例如Amazon Ring和Google Nest),这些摄像头在您到达家时会打开门,或者在看到陌生人时通知您,为您的驾驶提供帮助的智能车摄像头或停车场上的摄像头。在您进入或退出时打开大门,仅举几例! 这些物联网设备中的一些已经在一定程度上使用了AI,而另一些则正在缓慢追赶。

Many real-world applications demand real-time, on device processing capabilities. A self-driving car is a perfect example of this. In order for cars to drive down any road safely, they must observe the road in real-time and stop if a person walks in front of the car. In such a case, processing visual information and making a decision needs to be done in real-time, on device.

许多实际应用程序需要实时的设备处理功能。 无人驾驶汽车就是一个很好的例子。 为了使汽车安全地驶下任何道路,他们必须实时观察道路,如果有人在汽车前走,则必须停车。 在这种情况下,需要在设备上实时进行视觉信息处理和决策。

So, returning to the earlier question: CAN YOU USE THESE MODELS IN YOUR APPLICATION?

因此,回到前面的问题: 您可以在应用程序中使用这些模型吗?

If you’re using Computer Vision, there’s a high chance your application requires an IoT device, and looking at the forecast for the IoT devices, you’re in good company.

如果您使用的是Computer Vision,则您的应用程序很有可能需要IoT设备,并且查看IoT设备的预测,您的公司就很好。

The main challenge is that IoT devices are resource constrained; they have limited memory and low compute power. The more trainable parameters in a model, the bigger its size. Inference time of a deep learning model increases along with the increase in number of trainable parameters. Moreover, models with high parameters require more energy and space in comparison to a smaller network with fewer parameters. The end result is that when the size of the model is big, it’s difficult to deploy on resource-constrained devices. While these models have been successful in achieving great results in a lab, they aren’t usable in many real-world applications.

主要的挑战是物联网设备受到资源的限制。 它们的内存有限且计算能力较低。 模型中可训练的参数越多,其大小越大。 深度学习模型的推理时间随着可训练参数数量的增加而增加。 此外,与参数较少的较小网络相比,参数较高的模型需要更多的能量和空间。 最终结果是,当模型的大小很大时,很难在资源受限的设备上进行部署。 尽管这些模型已在实验室中成功取得了不错的成绩,但它们并没有在许多实际应用中使用。

In the lab, you have expensive and high-speed GPUs to get this level of performance [1], but when you deploy in the real-world the cost, power, heat and other issues preclude the “just throw more iron at it” strategy.

在实验室中,您拥有昂贵和高速的GPU才能达到这种水平的性能[1],但是在实际环境中部署时,成本,功耗,热量和其他问题排除了“只是给它扔更多铁”的麻烦。战略。

Deploying deep learning models on the cloud is an option as it can provide high computational and storage availability. However, it will have poor response times due to network latency, which is unacceptable in many real-time applications (and don’t get me started on the network connectivity’s impact on overall reliability, or privacy!).

在云上部署深度学习模型是一种选择,因为它可以提供较高的计算和存储可用性。 但是,由于网络延迟,响应时间会很差,这在许多实时应用程序中是不可接受的(并且不要让我着手了解网络连接对整体可靠性或隐私的影响!)。

In short, AI needs to process close to the data source, preferably on the IoT device itself !

简而言之,人工智能需要在靠近数据源的地方进行处理,最好是在物联网设备本身上进行处理!

That leaves us with one option: Reducing the size of the model.

这给我们留下了一个选择: 减小模型的大小。

Making a smaller model that can run under the constraints of the edge-devices is a key challenge. And that too without compromising on accuracy. It is just not enough to have a small model that can run on resource constrained devices. It should perform well, both in terms of accuracy and inference speed.

制作可以在边缘设备的约束下运行的较小模型是一项关键挑战。 而且这也不会影响准确性。 拥有一个可以在资源受限的设备上运行的小型模型仅仅是不够的。 无论是准确性还是推断速度,它都应该表现良好。

So how do you fit these models on limited devices? How do you make them usable in real-world applications?

那么,如何在有限的设备上安装这些模型? 您如何使它们在实际应用中可用?

Here are a few techniques that can be used to reduce the model size so that you can deploy them on your IoT device.

这里有一些可用于减少模型大小的技术,您可以将它们部署在IoT设备上。

修剪 (Pruning)

Pruning reduces the number of parameters by removing redundant, unimportant connections that are not sensitive to performance. This not only helps reduce the overall model size but also saves on computation time and energy.

修剪通过删除对性能不敏感的冗余,不重要的连接来减少参数的数量。 这不仅有助于减小整体模型的尺寸,而且可以节省计算时间和精力。

Image for post
source) 来源 )

Pros:

优点:

  • Can be applied during or after training

    可在训练期间或训练后使用
  • Can improve the inference time/ model size vs accuracy tradeoff for a given architecture [12]

    对于给定的体系结构,可以改善推理时间/模型大小与精度的权衡[12]
  • Can be applied to both convolutional and fully connected layers

    可以应用于卷积层和完全连接层

Cons:

缺点:

  • Generally, does not help as much as switching to a better architecture [12]

    通常,并不能帮助您切换到更好的体系结构[12]
  • Implementations that benefit latency are rare as TensorFlow’s only brings model size benefits

    由于TensorFlow仅带来模型大小上的优势,因此很少有能使延迟受益的实现
Image for post
Speed and size tradeoff for original and pruned models [13]
原始模型和修剪模型的速度和大小之间的权衡[13]

量化 (Quantization)

In DNN, weights are stored as 32-bit floating-point numbers. Quantization is the idea of representing these weights by reducing the number of bits. The weights can be quantized to 16-bit, 8-bit, 4-bit or even with 1-bit. By reducing the number of bits used, the size of the deep neural network can be significantly reduced.

在DNN中,权重存储为32位浮点数。 量化是通过减少位数来表示这些权重的想法。 权重可以量化为16位,8位,4位甚至1位。 通过减少使用的位数,可以大大减少深度神经网络的大小。

Image for post
source) 来源 )

Pros:

优点:

  • Quantization can be applied both during and after training

    训练期间和训练后均可应用量化
  • Can be applied to both convolutional and fully connected layers

    可以应用于卷积层和完全连接层

Cons:

缺点:

  • Quantized weights make neural networks harder to converge. A smaller learning rate is needed to ensure the network to have good performance. [13]

    量化权重使神经网络难以收敛。 需要较小的学习速率以确保网络具有良好的性能。 [13]
  • Quantized weights make back-propagation infeasible since gradient cannot back-propagate through discrete neurons. Approximation methods are needed to estimate the gradients of the loss function with respect to the input of the discrete neurons [13]

    量化的权重使得反向传播不可行,因为梯度无法通过离散的神经元反向传播。 需要一种近似方法来估计损失函数相对于离散神经元输入的梯度[13]。
  • TensorFlow’s quantize-aware training does not do any quantization during the training itself. Only statistics are gathered during training and those are used to quantize post training. So I am not sure if the above points should be included as cons

    TensorFlow的量化感知训练在训练本身不会进行任何量化。 在培训期间仅收集统计信息,这些统计信息用于量化培训后的数据。 因此,我不确定上述几点是否应包括在内。

知识升华 (Knowledge distillation)

In knowledge distillation, a large, complex model is trained on a large dataset. When this large model can generalize and perform well on unseen data, it is transferred to a smaller network. The larger model is also known as the teacher model and the smaller network is also known as the student network.

在知识蒸馏中,在大型数据集上训练大型复杂模型。 当这种大型模型可以推广并在看不见的数据上表现良好时,它将转移到较小的网络。 较大的模型也称为教师模型,较小的网络也称为学生网络。

Image for post
source) 来源 )

Pros:

优点:

  • If you have a pre-trained teacher network, less training data required to train the smaller (student) network.

    如果您有预先训练的教师网络,则训练较小的(学生)网络所需的训练数据较少。
  • If you have a pre-trained teacher network, training of the smaller (student) network is faster.

    如果您有经过预先培训的教师网络,则对较小的(学生)网络的培训会更快。
  • Can downsize a network regardless of the structural difference between the teacher and the student network.

    无论教师网络和学生网络之间的结构差异如何,都可以缩小网络规模。

Cons:

缺点:

  • If you do not have a pre-trained teacher network, it may require a larger dataset and take more time to train it.

    如果您没有经过预先培训的教师网络,则可能需要更大的数据集并需要更多时间来进行培训。

选择性注意 (Selective Attention)

Selective attention is the idea of focusing on objects or elements of interest, while discarding the others (often background or other task-irrelevant objects). It is inspired by the biology of the human eye. When we look at something, we only focus on one or a few objects at a time, and other regions are blurred out.

选择性注意是一种专注于关注的对象或元素,而舍弃其他对象或元素(通常是背景或其他与任务无关的对象)的想法。 它受到人眼生物学的启发。 当我们看某物时,一次只关注一个或几个对象,而其他区域则模糊了。

Selective attention Xailient
source) 来源 )

This requires adding a selective attention network upstream of your existing AI system or using it by itself if it serves your purpose. It depends on the problem you are trying to solve.

这需要在现有AI系统的上游添加选择性注意网络,或者如果您的目的有用,则单独使用它。 这取决于您要解决的问题。

Pros:

优点:

  • Faster inference

    更快的推断
  • Smaller model (e.g. a face detector and cropper that’s only 44 KB!)

    较小的模型(例如,只有44 KB的面部检测器和裁切器!)
  • Accuracy gain (by focusing downstream AI on only the regions/objects of interest)

    准确性提升(通过仅将下游AI集中在感兴趣的区域/对象上)

Cons:

缺点:

  • Supports only training from scratch

    仅支持从头开始培训

低阶分解 (Low-rank factorization)

Uses matrix/tensor decomposition to estimate the informative parameters. A weight matrix A with m x n dimension and having a rank r is replaced by smaller dimension matrices. This technique helps by factorizing a large matrix into smaller matrices.

使用矩阵/张量分解来估计信息性参数。 具有m×n维并且具有等级r的权重矩阵A被较小维矩阵取代。 通过将大矩阵分解为较小的矩阵,此技术将有所帮助。

Image for post
source) 来源 )

Pros:

优点:

  • Can be applied during or after training

    可在训练期间或训练后使用
  • Can be applied to both convolutional and fully connected layers

    可以应用于卷积层和完全连接层
  • When applied during training, can reduce training time

    在训练过程中使用时,可以减少训练时间

The best part is, all of the above techniques are complementary to each other. They can be applied as is or combined with one or multiple techniques. By using a three-stage pipeline; pruning, quantization and Huffman coding to reduce the size of the pre-trained model, VGG16 model trained on the ImageNet dataset was reduced from 550 to 11.3 MB.

最好的部分是,所有上述技术都是互补的。 它们可以按原样应用,也可以与一种或多种技术结合使用。 通过使用三级管道; 修剪,量化和霍夫曼编码以减小预训练模型的大小,在ImageNet数据集上训练的VGG16模型从550 MB减少到11.3 MB。

Most of the techniques discussed above can be applied to pre-trained models, as a post-processing step to reduce your model size and increase inference speed. But they can be applied during training time as well. Quantization is gaining popularity and has now been baked into machine learning frameworks. We can expect pruning to be baked into popular frameworks very soon.

上面讨论的大多数技术都可以应用于预训练模型,作为减少模型大小和提高推理速度的后处理步骤。 但是它们也可以在训练期间应用。 量化正变得越来越流行,现在已经融入了机器学习框架。 我们可以期望修剪很快就会流行到流行的框架中。

In this article, we looked at the motivation for deploying deep-learning based models to resource constrained devices such as IoT devices and the need to reduce model size so they fit without compromising accuracy. We also discussed the pros and cons of some modern techniques to compress deep-learning models . Finally, we touched on the idea that each of the techniques can either be applied individually or can be combined.

在本文中,我们探讨了将基于深度学习的模型部署到资源受限的设备(例如IoT设备)的动机,以及减小模型大小以使其适合而又不影响准确性的需求。 我们还讨论了一些现代技术压缩深度学习模型的利弊。 最后,我们提出了一种想法,即每种技术可以单独应用也可以组合使用。

Be sure to explore all the techniques for your model, post training as well as during training and figure out what works best for you.

确保在您的模型,培训后以及培训期间探索所有技术,并找出最适合您的方法。

Which model compression techniques have worked best for you? Leave comments below.

哪种模型压缩技术最适合您? 在下面留下评论。

Want to train your own selective attention network? Click here.

是否想训练自己的 选择性注意网络 请点击这里

Originally published in www.xailient.com/blog.

最初发表于 www.xailient.com/blog

About the author

关于作者

Sabina Pokhrel works at Xailient, a computer-vision start-up that has built the world’s fastest Edge-optimized object detector.

Sabina Pokhrel在 Xailient 工作, Xailient 是一家计算机视觉的初创公司,已建立了世界上最快的Edge优化对象探测器。

  1. https://towardsdatascience.com/machine-learning-models-compression-and-quantization-simplified-a302ddf326f2

    https://towardsdatascience.com/machine-learning-models-compression-and-quantization-simplified-a302ddf326f2

  2. Buciluǎ, C., Caruana, R., & Niculescu-Mizil, A. (2006, August). Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 535–541).

    Buciluǎ,C.,Caruana,R.,&Niculescu-Mizil,A.(2006年8月)。 模型压缩。 在第12届ACM SIGKDD国际会议上,有关知识发现和数据挖掘的会议论文集 (第535-541页)。

  3. Cheng, Y., Wang, D., Zhou, P., & Zhang, T. (2017). A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282.

    Cheng Y.,Wang D.,Zhou P.,&Zhang,T.(2017年)。 深度神经网络的模型压缩和加速研究。 arXiv预印本arXiv:1710.09282

  4. http://mitchgordon.me/machine/learning/2020/01/13/do-we-really-need-model-compression.html

    http://mitchgordon.me/machine/learning/2020/01/13/do-we-really-need-model-compression.html

  5. https://software.intel.com/content/www/us/en/develop/articles/compression-and-acceleration-of-high-dimensional-neural-networks.html

    https://software.intel.com/content/www/us/en/develop/articles/compression-and-acceleration-of-high-Dimension-neural-networks.html

  6. https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96

    https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96

  7. https://www.learnopencv.com/number-of-parameters-and-tensor-sizes-in-convolutional-neural-network/

    https://www.learnopencv.com/number-of-parameters-and-tensor-sizes-in-convolutional-neural-network/

  8. Véstias, M. P. (2019). A survey of convolutional neural networks on edge with reconfigurable computing. Algorithms, 12(8), 154.

    国会议员韦斯蒂亚斯(2019)。 可重构计算对边缘卷积神经网络的研究。 算法12 (8),154。

  9. https://technology.informa.com/596542/number-of-connected-iot-devices-will-surge-to-125-billion-by-2030-ihs-markit-says

    https://technology.informa.com/596542/number-of-connected-iot-devices-will-surge-to-1,250亿-by 2030-ihs-markit-says

  10. https://www.cisco.com/c/dam/en/us/products/collateral/se/internet-of-things/at-a-glance-c45-731471.pdf

    https://www.cisco.com/c/dam/zh/us/products/collat​​eral/se/internet-of-things/at-a-glance-c45-731471.pdf

  11. Mohan, A., Gauen, K., Lu, Y. H., Li, W. W., & Chen, X. (2017, May). Internet of video things in 2030: A world with many cameras. In 2017 IEEE International Symposium on Circuits and Systems (ISCAS) (pp. 1–4). IEEE.

    Mohan,A.,Gauen,K.,Lu,YH,Li,WW,&Chen,X.(2017年5月)。 2030年的视频物联网:拥有许多摄像头的世界。 在2017年IEEE国际电路与系统专题讨论会(ISCAS) (第1-4页)中。 IEEE。

  12. Blalock, D., Ortiz, J. J. G., Frankle, J., & Guttag, J. (2020). What is the state of neural network pruning?. arXiv preprint arXiv:2003.03033.

    Blalock,D.,Ortiz,JJG,Frankle,J.,&Guttag,J.(2020年)。 神经网络修剪的状态是什么? arXiv预印本arXiv:2003.03033

  13. Guo, Y. (2018). A survey on methods and theories of quantized neural networks. arXiv preprint arXiv:1808.04752.

    郭Y(2018)。 量化神经网络的方法和理论研究。 arXiv预印本arXiv:1808.04752

翻译自: https://towardsdatascience.com/model-compression-needs-and-importance-6e5913996e1

模型压缩

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值