RETHINKING THE VALUE OF NETWORK PRUNING 笔记:

RETHINKING THE VALUE OF NETWORK PRUNING 笔记:

https://download.csdn.net/download/weixin_44543648/18515920

ABSTRACT:

  1. training a large, over-parameterized model is often not necessary to obtain an efficient final model
  2. learned “important” weights of the large model are typically not useful for the small pruned model
  3. the pruned architecture itself, rather than a set of inherited “important”weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm.

1,获得一个有效的模型通常不需要去训练一个大的、过度参数化的模型。
2,去学习大模型中重要的权重,对被删减的模型作用很小。
3,删减后模型的架构远远重要过去继承大模型中的重要权重。

INTRODUCTION

A typical procedure of network pruning consists of three stages: 1) train a large, over-parameterized model (sometimes there are pretrained models available), 2) prune the trained large model according to a certain criterion, and 3) fine-tune the pruned model to regain the lost performance.

一个典型的网络修剪过程包括三个阶段:
1)训练一个大的、过参数化的模型(有时有预训练的模型可用),
2)根据某个标准修剪训练好的大模型,
3)微调修剪后的模型以恢复丢失的性能。

Generally, there are two common beliefs behind this pruning procedure. First, it is believed that starting with training a large, over-parameterized network is important (Luo et al., 2017; Carreira-Perpinán & Idelbayev, 2018), as it provides a highperformance model (due to stronger representation & optimization power) from which one can safely remove a set of redundant parameters without significantly hurting the accuracy. Therefore, this is usually believed, and reported to be superior to directly training a smaller network from scratch (Li et al., 2017; Luo et al., 2017; He et al., 2017b;u et al., 2018) – a commonly used baseline approach. Second, both the pruned architecture and its associated weights are believed to be essential for obtaining the final efficient model (Han et al.,2015). Thus most existing pruning techniques choose to fine-tune a pruned model instead of train-ing it from scratch. The preserved weights after pruning are usually considered to be critical, as how to accurately select the set of important weights is a very active research topic in the literature (Molchanov et al., 2016; Li et al., 2017; Luo et al., 2017; He et al., 2017b; Liu et al., 2017; Suauet al., 2018)
First, for structured pruning methods with predefined target network architectures (Figure 2), directly training the small target model from random initialization can achieve the same, if not better,performance, as the model obtained from the three-stage pipeline. In this case, starting with a large model is not necessary and one could instead directly train the target model from scratch. Second, for structured pruning methods with autodiscovered target networks, training the pruned model from scratch can also achieve comparable or even better performance than fine-tuning,This observation shows that for these pruning methods,what matters more may be the obtained architecture, instead of the preserved weights, despite training the large model is needed to find that target architecture.

在网络修剪上,普遍有两个观点:
1,人们认为从训练大的、过参数化的网络开始是重要的,因为它提供了一个高性能模型(由于更强的表示和优化能力),从中可以安全地删除一组冗余参数,而不会显著影响准确性。
2,修剪后的架构及其相关权重被认为对于获得最终的有效模型是必不可少的。
因此,大多数现有的修剪技术选择微调被修剪的模型,而不是从头开始训练它。
然而,在作者的工作中表明:这种信念是不一定正确的。
作者发现:
1,直接训练随机初始化的小目标模型可以获得与从三阶段流水线获得的模型相同或者相似的性能。在这种情况下,从一个大的模型开始是不必要的,可以直接从零开始训练目标模型。而不需要提前训练大的、过度参数化的模型。
2,对于具有自动发现的目标网络的结构化修剪方法,从头开始训练修剪后的模型也可以获得与微调相当甚至更好的性能

这一观察表明,对于这些修剪方法,更重要的可能是获得的架构,而不是保留的权重,尽管需要训练大模型来找到目标架构。

for a unstructured pruning method (Han et al., 2015) that prunes individual parameters, we found that training from scratch can mostly achieve comparable accuracy with pruning and fine-tuning on smaller-scale datasets, but fails to do so on the large-scale ImageNet benchmark.Note that in some cases, if a pretrained large model is already available, pruning and fine-tuning from it can save the training time required to obtain the efficient model.

非结构化的修剪方法在较小的数据集上通过修剪和微调基本都可以达到相当的精度,但是,在大规模的ImageNet数据集上却不能。请注意,在某些情况下,如果一个预训练的大型模型已经可用,从中进行修剪和微调可以节省获得高效模型所需的训练时间。

BACKGROUND

Those large models can be infeasible to store, and run in real time on embedded systems. To address this issue, many methods have been proposed such as low-rank approximation of weights (Denton et al., 2014; Lebedev et al., 2014), weight quantization(Courbariaux et al., 2016; Rastegari et al., 2016), knowledge distillation (Hinton et al., 2014; Romero et al., 2015) and network pruning (Han et al., 2015; Li et al., 2017), among which network pruning has gained notable attention due to their competitive performance and compatibility

为了解决大模型的问题,已有的方法有:权重的低秩近似(Denton等人,2014;Lebedev等人,2014),权重量化(Courbariaux等人,2016;Rastegari等人,2016),知识蒸馏(Hinton等人,2014;Romero等人,2015)和网络剪枝(Han等人,2015;李等,2017)

One major branch of network pruning methods is individual weight pruning, and it dates back to Optimal Brain Damage (LeCun et al., 1990) and Optimal Brain Surgeon (Hassibi & Stork, 1993),which prune weights based on Hessian of the loss function. More recently, Han et al. (2015) proposes to prune network weights with small magnitude, and this technique is further incorporated into the “Deep Compression” pipeline (Han et al., 2016b) to obtain highly compressed models. Srinivas & Babu (2015) proposes a data-free algorithm to remove redundant neurons iteratively. Molchanov et al. (2017) uses V ariatonal Dropout (P . Kingma et al., 2015) to prune redundant weights. Louizos et al. (2018) learns sparse networks through L0-norm regularization based on stochastic gate. However, one drawback of these unstructured pruning methods is that the resulting weight matrices are sparse, which cannot lead to compression and speedup without dedicated hardware/libraries (Han
et al., 2016a).

网络修剪方法的一个主要分支是个体权重修剪,其中就有:Han等人(2015)提出用小幅度修剪网络权重,并且该技术被进一步结合到“深度压缩”管道(Han等人,2016b)中以获得高度压缩的模型。Srinivas & Babu (2015)提出了一种迭代去除冗余神经元的无数据算法。莫尔恰诺夫等人(2017年)使用变异缺失(P . Kingma等人,2015年)来修剪冗余权重。Louizos等人(2018)通过基于随机门的L0范数正则化学习稀疏网络。

然而,这些非结构化剪枝方法的一个缺点是得到的权重矩阵是稀疏的,如果没有专用的硬件/库,这不能导致压缩和加速

In contrast, structured pruning methods prune at the level of channels or even layers. Since the original convolution structure is still preserved, no dedicated hardware/libraries are required to realize the benefits. Among structured pruning methods, channel pruning is the most popular, since it operates at the most fine-grained level while still fitting in conventional deep learning frameworks.Some heuristic methods include pruning channels based on their corresponding filter weight norm(Li et al., 2017) and average percentage of zeros in the output (Hu et al., 2016). Group sparsity is also widely used to smooth the pruning process after training (Wen et al., 2016; Alvarez & Salzmann, 2016; Lebedev & Lempitsky, 2016; Zhou et al., 2016). Liu et al. (2017) and Ye et al. (2018)impose sparsity constraints on channel-wise scaling factors during training, whose magnitudes are
then used for channel pruning. Huang & Wang (2018) uses a similar technique to prune coarser structures such as residual blocks. He et al. (2017b) and Luo et al. (2017) minimizes next layer’s feature reconstruction error to determine which channels to keep. Similarly, Yu et al. (2018) optimizes the reconstruction error of the final response layer and propagates a “importance score” for each channel. Molchanov et al. (2016) uses Taylor expansion to approximate each channel’s influence over the final loss and prune accordingly. Suau et al. (2018) analyzes the intrinsic correlation within each layer and prune redundant channels. Chin et al. (2018) proposes a layer-wise compensate filter pruning algorithm to improve commonly-adopted heuristic pruning metrics. He et al.(2018a) proposes to allow pruned filters to recover during the training process. Lin et al. (2017);Wang et al. (2017) prune certain structures in the network based on the current input。

相比之下,结构化剪枝方法在通道甚至层的层次上进行剪枝。由于原始卷积结构仍然保留(因为只改变了通道数,不需要改变原有的网络框架),因此不需要专用硬件/库来实现这些好处。已有的一些方法有:
1,基于相应的滤波器权重范数,输出中的平均零百分比来修剪通道。
2,组稀疏性也被广泛用于平滑训练后的剪枝过程。
3,在训练期间对信道方向的缩放因子施加稀疏性约束,然后将其大小用于信道修剪
4,最小化下一层的特征重构误差,以确定保留哪些通道。
5,优化最终响应层的重构误差,并为每个通道传播“重要性分数”。
6,使用泰勒展开来近似每个通道对最终损失的影响
7,分析每一层的内在相关性,并修剪冗余通道。
8,分层补偿滤波器修剪算法,以改进常用的启发式修剪度量。
9,训练过程中允许删减的过滤器恢复。
10,基于当前输入修剪网络中的某些结构
11,彩票假设:当单独训练时,某些连接及其随机初始化的权重可以实现与原始网络相当的精度。

zhu&Gupta(2018)表明,训练一个小密度模型不能达到相同的内存占用修剪大稀疏模型相同的精度。用继承的权重微调被修剪的模型并不比从头开始训练它好;由此产生的精简架构更有可能带来好处

METHODOLOGY

在这里插入图片描述

We first divide network pruning methods into two categories. In a pruning pipeline, the target pruned model’s architecture can be determined by either a human (i.e.,predefined) or the pruning algorithm (i.e., automatic)

网络修剪方法分为两类。在修剪流水线中,被修剪的目标模型的架构可以由人工(即预定义的)或者修剪算法(即自动的)来确定

When a human predefines the target architecture, a common criterion is the ratio of channels to prune in each layer. For example, we may want to prune 50% channels in each layer of VGG. In this case, no matter which specific channels are pruned, the pruned target architecture remains the same,because the pruning algorithm only locally prunes the least important 50% channels in each layer. In practice, the ratio in each layer is usually selected through empirical studies or heuristics. Examples of predefined structured pruning include Li et al. (2017), Luo et al. (2017), He et al. (2017b) and He
et al. (2018a) When the target architecture is automatically determined by a pruning algorithm, it is usually based on a pruning criterion that globally compares the importance of structures (e.g., channels) across layers. Examples of automatic structured pruning include Liu et al. (2017), Huang & Wang (2018),Molchanov et al. (2016) and Suau et al. (2018).

当人为定义修剪结构时,共同的方法就是设定每一层的通道修剪比例,局限性比较大,而自动定义时,通常是基于全局比较跨层的结构来比较通道的重要性。

Unstructured pruning (Han et al., 2015; Molchanov et al., 2017; Louizos et al., 2018) also falls in the category of automatic methods, where the positions of pruned weights are determined by the training process and the pruning algorithm, and it is usually not possible to predefine the positions of zeros before training starts.

非结构化剪枝(韩等,2015;Molchanov等人,2017年;Louizos等人,2018)也属于自动方法的范畴,其中修剪的权重的位置由训练过程和修剪算法来确定,并且通常不可能在训练开始之前预定义零的位置。

后面讲的就是训练方法,这边就不过多叙述了

### 回答1: Inception 架构是一种用于计算机视觉的神经网络架构,它通过使用不同尺寸的卷积核来捕捉图像中的不同级别特征。近年来,研究者们对 Inception 架构进行了重新思考,提出了许多改进版本,如 Inception-v2 和 Inception-v3。这些改进版本通过更深层次的网络结构、更高效的卷积层、更强大的正则化方法等来增强模型的性能。 ### 回答2: "重新思考计算机视觉中的Inception架构"是指对计算机视觉中的Inception架构进行反思和重新设计的过程。 在计算机视觉中,深度学习网络被广泛应用于图像分类、物体检测和语义分割等任务。Inception架构是一种流行的深度学习架构之一,它的特点是使用了一系列不同尺寸的卷积核和Inception模块,以提取不同尺度下的图像特征。 然而,随着计算机视觉任务的不断发展和挑战的出现,人们开始重新思考和改进Inception架构。对Inception架构的重新思考主要包括以下几个方面: 首先,针对Inception架构中的参数数量过多和计算复杂度高的问题,人们提出了一些改进方法。例如,通过降低Inception模块中卷积核的维度和参数数量,可以减少计算量,提高网络的训练和推理效率。 其次,人们提出了一些新的模块和网络结构,以解决Inception架构在某些任务上的性能限制。例如,ResNet和DenseNet等网络结构通过引入残差连接和稠密连接,解决了深度网络中的梯度消失和信息丢失问题。 此外,人们还关注如何将Inception架构与其他架构进行融合,以进一步提升计算机视觉任务的性能。例如,人们将Inception架构与注意力机制相结合,以提高目标检测和图像分割的准确性。 总之,"重新思考计算机视觉中的Inception架构"是一个不断演进的过程。通过反思和优化Inception架构,人们可以提高计算机视觉任务的性能、准确性和效率,推动计算机视觉领域的发展。 ### 回答3: 重新思考计算机视觉中的初始架构(rethinking the inception architecture for computer vision)是指对计算机视觉模型中的初始网络架构进行重新思考和改进。 计算机视觉是人工智能领域中的一个重要分支,它致力于让计算机能够像人一样理解和处理图像和视频。而计算机视觉模型的架构对于模型的性能和效果具有很大的影响。 Inception架构是一种经典的计算机视觉模型架构,最早由谷歌在2014年提出。它通过使用多尺度的卷积层和并行结构来提高模型的性能和效果。然而,随着技术的发展和需求的变化,原始的Inception架构可能存在一些限制和缺陷。 重新思考Inception架构意味着我们需要针对当前的计算机视觉任务和要求,重新设计和改进Inception架构。有几个方面可以考虑: 首先,我们可以通过引入更先进的卷积技术和结构来改善模型的性能。例如,可以使用Dilated Convolution(空洞卷积)来增加感受野,或者使用Depthwise Separable Convolution(分离卷积)来减少参数量和计算量。 其次,我们可以将其他经典和有效的架构和思想与Inception架构相结合,以进一步提升性能。例如,可以引入残差连接(Residual Connection)来加快训练速度和提高模型的泛化能力。 此外,我们还可以针对具体的计算机视觉任务,对Inception架构进行特定的优化。例如,对于目标检测任务,可以加入适应性池化层(Adaptive Pooling Layer)来获得更好的位置和尺度信息。 总之,重新思考Inception架构是一个不断改进和优化计算机视觉模型的过程。通过结合新的技术、思想和任务需求,我们可以进一步提高计算机视觉模型的性能和效果。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值