NNI调试记录-Pruning

@daviiid

已于 2023-12-26 16:34:14 修改

阅读量1k

点赞数 28

分类专栏： AI 文章标签：剪枝人工智能深度学习神经网络边缘计算

于 2023-12-26 16:31:41 首次发布

本文链接：https://blog.csdn.net/wb3533366/article/details/135221746

版权

AI 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

ModelCompression，主要分为两个部分，Pruning和Quantization，这篇记录Pruning。

Pruning

Pruning即剪枝，算是模型压缩中最重要也是最有效的一环。剪枝的有效性前提是参数有冗余，如果模型已经足够紧凑，甚至都欠拟合，那就没必要做这一步了，建议先过参，然后再剪枝，这样最合理。

配置config

config定义要剪枝的层，以及稀疏率等等。稀疏率决定要多大程度压缩这个层，比如’sparse_ratio’: 0.5，表示要最多压缩50%的参数，实际可能比这个要小。

standard_config = {
    'op_names': ['fc1', 'fc2'],
    'target_names': ['weight', 'bias'],
    'target_settings': {
        'weight': {
            'sparse_ratio': 0.8,
            'max_sparse_ratio': None,
            'min_sparse_ratio': None,
            'sparse_threshold': None,
            'global_group_id': None,
            'dependency_group_id': None,
            'granularity': 'default',
            'apply_method': 'mul',
        },
        'bias': {
            'align': {
                'target_name': 'weight',
                'dims': [0],
            },
            'apply_method': 'mul',
        }
    }
}

上面是一个完整的config，一般只要关注如下几个就行了：

‘op_names’ 算子的名字，print(model)可以看到每个算子的名字；
‘target_names’ 压缩目标名字，支持 input, weight, bias, output ，一般选择weight和bias；
‘sparse_ratio’ 上面已经介绍过了，取值0. ~ 1.；
‘max_sparse_ratio’和’min_sparse_ratio’，分别表示最大和最小的压缩比率，取值0. ~ 1.；
‘sparse_threshold’ 这个是一个绝对的稀疏阈值，不同的算法和不同的模型这个值都不一样，经验值需要统计后才能获取，通常这个值越大稀疏率就越大；
‘global_group_id’，搭配 sparse_ratio对相同id的op进行统一的稀疏率剪枝，同group_id的op的sparse_ratio必须一致。这个可能会有点不好理解，举个例子：

配置文件如下，fc1和fc3属于同一个group_id，都有100个参数，经过稀疏化后，fc1保留了60个参数对应该op的实际sparse_ratio为0.4，fc3保留了40个参数对应实际sparse_ratio为0.6，这个group的实际sparse_ratio为0.5：

config_list = [{
    'op_names': ['fc1'],
    'sparse_ratio': 0.5,
    'global_group_id': 'linear_group_1'
}, {
    'op_names': ['fc3'],
    'sparse_ratio': 0.5,
    'global_group_id': 'linear_group_1'
}]

‘dependency_group_id’ 这个主要用于保证多个层的剪枝索引一致，也就是保证剪的是同一个位置的通道。这个通常用于处理多个层需要add或者mul；
granularity 剪枝的维度，其他的比较好理解，就per_channel从官方解释看有点纠结，逐字理解如下：

首先，这个选项是针对target_name为 input, output；
其次，-1 dimension，表示的是最后一个维度；

default: The pruner will auto determine using which kind of granularity, usually consistent with the paper.

in_channel: The pruner will do pruning on the weight parameters 1 dimension.

out_channel: The pruner will do pruning on the weight parameters 0 dimension.

per_channel: The pruner will do pruning on the input/output -1 dimension.

list of integer: Block sparse will be applied. For example, [4, 4] will apply 4x4 block sparse on the last two dimensions of the weight parameters.

Pruner

这个是核心部分，主要是不同的剪枝算法（核心是不同的稀疏度量）,流程就是，对某一层逐通道进行权重的度量计算，然后排序，然后根据定义好的sparse_ratio获取阈值，然后把小于阈值的通道当作无效通道，最后去掉这些通道，以及和它关联的其他层的通道。

Level Pruner 粗暴简单绝对值排序，这个剪枝就是没有算法就是最好的算法，效果存疑；
L1 Norm Pruner 逐通道对参数进行L1排序
L2 Norm Pruner 同L1，只是换成L2;
FPGM Pruner，用几何中位数作为度量
Slim Pruner 这个应该算是最经典有效的方法了，原理就是对每一个BN层的scale参数添加一个L1的正则化，迫使scale中的大部分值都逼近0（越小表示这个参数对应CONV中这个通道的贡献越小），然后流程同上，建议使用这个作为baseline；
此外还有其他的，如Taylor FO Weight Pruner，Linear Pruner，AGP Pruner，Movement Pruner，基本都大同小异，感兴趣的可以都去尝试一下；

剪枝流程

定义模型，优化器等

# define the model
model = TorchModel().to(device)
optimizer = SGD(model.parameters(), 1e-2)
criterion = F.nll_loss

定义config

config_list = [{
    'op_types': ['Linear', 'Conv2d'],
    'exclude_op_names': ['fc3'],
    'sparse_ratio': 0.5
}]

定义优化器

from nni.compression.pruning import L1NormPruner
pruner = L1NormPruner(model, config_list)

模型压缩训练

# compress the model and generate the masks
_, masks = pruner.compress()

模型剪枝

pruner.unwrap_model()
from nni.compression.speedup import ModelSpeedup
ModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()

finetune压缩后模型（可选，但建议做）

optimizer = SGD(model.parameters(), 1e-2)
for epoch in range(3):
    trainer(model, optimizer, criterion)

最后

不是所有类型的block都可以剪枝，比如CSPNet这些比较复杂的，可能不一定行得通；
稀疏率这个需要调，不是越低越好，可以结合HPO进行自动炼丹；
一定要finetune，提升还是很明显的；
对通道数或者FLOPS最好设置一个限制（这个需要改NNI），比如要求是64或者32的倍数，因为很多ASIC的缓存都是很小的，对齐后能够提高命中率进而提高资源利用率，最后一定程度能够提升端侧的推理性能；