

Whether you’re new to computer vision or an expert, you’ve probably heard about AlexNet winning the ImageNet challenge in 2012. That was the turning point in computer vision history because it showed that deep learning models can perform tasks which were considered very difficult for computers, with an unprecedented level of accuracy.


But did you know that AlexNet had 62 million trainable parameters?


Interesting right.


Another popular model VGGNet which came out in 2014 had even more, 138 million trainable parameters.


That’s more than 2 times that of AlexNet.


You might be thinking… I know that the deeper the model is, the better it will perform. So why are you highlighting the number of parameters? Deeper the network, it is obvious that there will be more parameters.

您可能在想……我知道模型越深入,它的性能就会越好。 那么,为什么要突出显示参数数量呢? 网络越深,显然会有更多的参数。

#Parameters and #MACCs are in order of millions in the above table [8] 上表中的#Parameters和#MACC数百万个顺序 [8]

Sure, these deep models have been benchmarks in the computer vision industry. But when you want to create a real-world application, would you choose these models?

当然,这些深层模型已成为计算机视觉行业的基准。 但是,当您要创建实际应用程序时,是否会选择这些模型?

I guess the real question we should ask here is: CAN YOU USE THESE MODELS IN YOUR APPLICATION?

我猜我们应该在这里问的真正问题是: 您可以在应用程序中使用这些模型吗?

Hold that thought for just a minute!


Let me divert here for a bit, before I get to the answer. (But feel free to skip to the end.)

在得到答案之前,让我转移一下。 (但是请随意跳到最后。)

The number of IoT devices is expected to reach 125–500 Billion by 2030 and assuming that 20% of them will have cameras, IoT devices with cameras is a 13–100 billion unit market. [9,10,11]

到2030年,物联网设备的数量预计将达到1250亿至5000亿,并且假设其中20%配备摄像头,那么带摄像头的物联网设备的市场规模将达到13至1000亿。 [9,10,11]

IoT camera devices include home security cameras (such as Amazon Ring and Google Nest) that open the door when you reach home or notify you if it sees an unknown person, cameras on smart vehicles that assist your driving, or cameras at a parking lot that open the gate when you enter or exit, just to name a few! Some of these IoT devices are already using AI to some extent and others are catching up slowly.

物联网摄像头设备包括家庭安全摄像头(例如Amazon Ring和Google Nest),这些摄像头在您到达家时会打开门,或者在看到陌生人时通知您,为您的驾驶提供帮助的智能车摄像头或停车场上的摄像头。在您进入或退出时打开大门,仅举几例! 这些物联网设备中的一些已经在一定程度上使用了AI,而另一些则正在缓慢追赶。

Many real-world applications demand real-time, on device processing capabilities. A self-driving car is a perfect example of this. In order for cars to drive down any road safely, they must observe the road in real-time and stop if a person walks in front of the car. In such a case, processing visual information and making a decision needs to be done in real-time, on device.

许多实际应用程序需要实时的设备处理功能。 无人驾驶汽车就是一个很好的例子。 为了使汽车安全地驶下任何道路,他们必须实时观察道路,如果有人在汽车前走,则必须停车。 在这种情况下,需要在设备上实时进行视觉信息处理和决策。

So, returning to the earlier question: CAN YOU USE THESE MODELS IN YOUR APPLICATION?

因此,回到前面的问题: 您可以在应用程序中使用这些模型吗?

If you’re using Computer Vision, there’s a high chance your application requires an IoT device, and looking at the forecast for the IoT devices, you’re in good company.

如果您使用的是Computer Vision,则您的应用程序很有可能需要IoT设备,并且查看IoT设备的预测,您的公司就很好。

The main challenge is that IoT devices are resource constrained; they have limited memory and low compute power. The more trainable parameters in a model, the bigger its size. Inference time of a deep learning model increases along with the increase in number of trainable parameters. Moreover, models with high parameters require more energy and space in comparison to a smaller network with fewer parameters. The end result is that when the size of the model is big, it’s difficult to deploy on resource-constrained devices. While these models have been successful in achieving great results in a lab, they aren’t usable in many real-world applications.

主要的挑战是物联网设备受到资源的限制。 它们的内存有限且计算能力较低。 模型中可训练的参数越多,其大小越大。 深度学习模型的推理时间随着可训练参数数量的增加而增加。 此外,与参数较少的较小网络相比,参数较高的模型需要更多的能量和空间。 最终结果是,当模型的大小很大时,很难在资源受限的设备上进行部署。 尽管这些模型已在实验室中成功取得了不错的成绩,但它们并没有在许多实际应用中使用。

In the lab, you have expensive and high-speed GPUs to get this level of performance [1], but when you deploy in the real-world the cost, power, heat and other issues preclude the “just throw more iron at it” strategy.


Deploying deep learning models on the cloud is an option as it can provide high computational and storage availability. However, it will have poor response times due to network latency, which is unacceptable in many real-time applications (and don’t get me started on the network connectivity’s impact on overall reliability, or privacy!).

在云上部署深度学习模型是一种选择,因为它可以提供较高的计算和存储可用性。 但是,由于网络延迟,响应时间会很差,这在许多实时应用程序中是不可接受的(并且不要让我着手了解网络连接对整体可靠性或隐私的影响!)。

In short, AI needs to process close to the data source, preferably on the IoT device itself !


That leaves us with one option: Reducing the size of the model.

这给我们留下了一个选择: 减小模型的大小。

Making a smaller model that can run under the constraints of the edge-devices is a key challenge. And that too without compromising on accuracy. It is just not enough to have a small model that can run on resource constrained devices. It should perform well, both in terms of accuracy and inference speed.

制作可以在边缘设备的约束下运行的较小模型是一项关键挑战。 而且这也不会影响准确性。 拥有一个可以在资源受限的设备上运行的小型模型仅仅是不够的。 无论是准确性还是推断速度,它都应该表现良好。

So how do you fit these models on limited devices? How do you make them usable in real-world applications?

那么,如何在有限的设备上安装这些模型? 您如何使它们在实际应用中可用?

Here are a few techniques that can be used to reduce the model size so that you can deploy them on your IoT device.


修剪 (Pruning)

Pruning reduces the number of parameters by removing redundant, unimportant connections that are not sensitive to performance. This not only helps reduce the overall model size but also saves on computation time and energy.

修剪通过删除对性能不敏感的冗余,不重要的连接来减少参数的数量。 这不仅有助于减小整体模型的尺寸,而且可以节省计算时间和精力。

source) 来源 )



  • Can be applied during or after training

  • Can improve the inference time/ model size vs accuracy tradeoff for a given architecture [12]

  • Can be applied to both convolutional and fully connected layers




  • Generally, does not help as much as switching to a better architecture [12]

  • Implementations that benefit latency are rare as TensorFlow’s only brings model size benefits

Speed and size tradeoff for original and pruned models [13]

量化 (Quantization)

In DNN, weights are stored as 32-bit floating-point numbers. Quantization is the idea of representing these weights by reducing the number of bits. The weights can be quantized to 16-bit, 8-bit, 4-bit or even with 1-bit. By reducing the number of bits used, the size of the deep neural network can be significantly reduced.

在DNN中,权重存储为32位浮点数。 量化是通过减少位数来表示这些权重的想法。 权重可以量化为16位,8位,4位甚至1位。 通过减少使用的位数,可以大大减少深度神经网络的大小。

source) 来源 )



  • Quantization can be applied both during and after training

  • Can be applied to both convolutional and fully connected layers




  • Quantized weights make neural networks harder to converge. A smaller learning rate is needed to ensure the network to have good performance. [13]

    量化权重使神经网络难以收敛。 需要较小的学习速率以确保网络具有良好的性能。 [13]
  • Quantized weights make back-propagation infeasible since gradient cannot back-propagate through discrete neurons. Approximation methods are needed to estimate the gradients of the loss function with respect to the input of the discrete neurons [13]

    量化的权重使得反向传播不可行,因为梯度无法通过离散的神经元反向传播。 需要一种近似方法来估计损失函数相对于离散神经元输入的梯度[13]。
  • TensorFlow’s quantize-aware training does not do any quantization during the training itself. Only statistics are gathered during training and those are used to quantize post training. So I am not sure if the above points should be included as cons

    TensorFlow的量化感知训练在训练本身不会进行任何量化。 在培训期间仅收集统计信息,这些统计信息用于量化培训后的数据。 因此,我不确定上述几点是否应包括在内。

知识升华 (Knowledge distillation)

In knowledge distillation, a large, complex model is trained on a large dataset. When this large model can generalize and perform well on unseen data, it is transferred to a smaller network. The larger model is also known as the teacher model and the smaller network is also known as the student network.

在知识蒸馏中,在大型数据集上训练大型复杂模型。 当这种大型模型可以推广并在看不见的数据上表现良好时,它将转移到较小的网络。 较大的模型也称为教师模型,较小的网络也称为学生网络。

source) 来源 )



  • If you have a pre-trained teacher network, less training data required to train the smaller (student) network.

  • If you have a pre-trained teacher network, training of the smaller (student) network is faster.

  • Can downsize a network regardless of the structural difference between the teacher and the student network.




  • If you do not have a pre-trained teacher network, it may require a larger dataset and take more time to train it.


选择性注意 (Selective Attention)

Selective attention is the idea of focusing on objects or elements of interest, while discarding the others (often background or other task-irrelevant objects). It is inspired by the biology of the human eye. When we look at something, we only focus on one or a few objects at a time, and other regions are blurred out.

选择性注意是一种专注于关注的对象或元素,而舍弃其他对象或元素(通常是背景或其他与任务无关的对象)的想法。 它受到人眼生物学的启发。 当我们看某物时,一次只关注一个或几个对象,而其他区域则模糊了。

source) 来源 )

This requires adding a selective attention network upstream of your existing AI system or using it by itself if it serves your purpose. It depends on the problem you are trying to solve.

这需要在现有AI系统的上游添加选择性注意网络,或者如果您的目的有用,则单独使用它。 这取决于您要解决的问题。



  • Faster inference

  • Smaller model (e.g. a face detector and cropper that’s only 44 KB!)

    较小的模型(例如,只有44 KB的面部检测器和裁切器!)
  • Accuracy gain (by focusing downstream AI on only the regions/objects of interest)




  • Supports only training from scratch


低阶分解 (Low-rank factorization)

Uses matrix/tensor decomposition to estimate the informative parameters. A weight matrix A with m x n dimension and having a rank r is replaced by smaller dimension matrices. This technique helps by factorizing a large matrix into smaller matrices.

使用矩阵/张量分解来估计信息性参数。 具有m×n维并且具有等级r的权重矩阵A被较小维矩阵取代。 通过将大矩阵分解为较小的矩阵,此技术将有所帮助。

source) 来源 )



  • Can be applied during or after training

  • Can be applied to both convolutional and fully connected layers

  • When applied during training, can reduce training time


The best part is, all of the above techniques are complementary to each other. They can be applied as is or combined with one or multiple techniques. By using a three-stage pipeline; pruning, quantization and Huffman coding to reduce the size of the pre-trained model, VGG16 model trained on the ImageNet dataset was reduced from 550 to 11.3 MB.

最好的部分是,所有上述技术都是互补的。 它们可以按原样应用,也可以与一种或多种技术结合使用。 通过使用三级管道; 修剪,量化和霍夫曼编码以减小预训练模型的大小,在ImageNet数据集上训练的VGG16模型从550 MB减少到11.3 MB。

Most of the techniques discussed above can be applied to pre-trained models, as a post-processing step to reduce your model size and increase inference speed. But they can be applied during training time as well. Quantization is gaining popularity and has now been baked into machine learning frameworks. We can expect pruning to be baked into popular frameworks very soon.

上面讨论的大多数技术都可以应用于预训练模型,作为减少模型大小和提高推理速度的后处理步骤。 但是它们也可以在训练期间应用。 量化正变得越来越流行,现在已经融入了机器学习框架。 我们可以期望修剪很快就会流行到流行的框架中。

In this article, we looked at the motivation for deploying deep-learning based models to resource constrained devices such as IoT devices and the need to reduce model size so they fit without compromising accuracy. We also discussed the pros and cons of some modern techniques to compress deep-learning models . Finally, we touched on the idea that each of the techniques can either be applied individually or can be combined.

在本文中,我们探讨了将基于深度学习的模型部署到资源受限的设备(例如IoT设备)的动机,以及减小模型大小以使其适合而又不影响准确性的需求。 我们还讨论了一些现代技术压缩深度学习模型的利弊。 最后,我们提出了一种想法,即每种技术可以单独应用也可以组合使用。

Be sure to explore all the techniques for your model, post training as well as during training and figure out what works best for you.


Which model compression techniques have worked best for you? Leave comments below.

哪种模型压缩技术最适合您? 在下面留下评论。

Want to train your own selective attention network? Click here.

是否想训练自己的 选择性注意网络 请点击这里

Originally published in www.xailient.com/blog.

最初发表于 www.xailient.com/blog

About the author


Sabina Pokhrel works at Xailient, a computer-vision start-up that has built the world’s fastest Edge-optimized object detector.

Sabina Pokhrel在 Xailient 工作, Xailient 是一家计算机视觉的初创公司,已建立了世界上最快的Edge优化对象探测器。

翻译自: https://towardsdatascience.com/model-compression-needs-and-importance-6e5913996e1


