神经网络模型压缩&实例教程—非结构化剪枝

码字神经元

已于 2023-05-05 11:24:16 修改

阅读量750

点赞数

分类专栏：神经网络文章标签：神经网络剪枝深度学习

于 2023-05-05 11:23:14 首次发布

本文链接：https://blog.csdn.net/qq_59572329/article/details/130487956

版权

神经网络专栏收录该内容

11 篇文章 2 订阅

订阅专栏

最先进的深度学习技术依赖于难以部署的过度参数化模型。相反，已知生物神经网络使用高效的稀疏连接。为了在不牺牲准确性的情况下减少内存、电池和硬件消耗，通过减少模型中的参数数量来确定压缩模型的最佳技术是很重要的。这反过来又允许您在设备上部署轻量级模型，并通过设备上的私有计算来保证隐私。

在研究方面，修剪被用于研究过度参数化和欠参数化网络之间学习动态的差异，研究幸运稀疏子网络和初始化(下面链接)作为破坏性神经结构搜索技术的作用，等等。

在本教程中，您将学习如何使用torch.nn.utils.prune来稀疏您的神经网络，以及如何扩展它来实现您自己的自定义修剪技术。

1.导包&定义一个简单的网络

#!/user/bin/env python3
# -*- coding: utf-8 -*-
# By PyTanAI.2023.05.05.
import torch
from torch import nn
import torch.nn.utils.prune as prune
import torch.nn.functional as F

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

'''搭建类LeNet网络'''

class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square conv kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # 5x5 image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, int(x.nelement() / x.shape[0]))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

2.获取网络需要剪枝的模块

model = LeNet().to(device=device)
module = model.conv1
print(list(module.named_parameters()))      # 6×3×3的weight + 6×1的bias 的参数量
print("缓冲区数据",list(module.buffers()))    # 缓冲区暂时没有数据

输出：

[('weight', Parameter containing:
tensor([[[[ 0.2881, -0.1194,  0.1755],
          [ 0.3237, -0.2420,  0.2648],
          [ 0.2360, -0.1297, -0.3236]]],


        [[[ 0.2951, -0.2125,  0.0272],
          [ 0.3029,  0.0733,  0.2472],
          [-0.1719,  0.0348,  0.1115]]],


        [[[ 0.3079,  0.0183, -0.2626],
          [-0.2539, -0.1793,  0.1540],
          [ 0.2064, -0.2641, -0.2036]]],


        [[[-0.1372,  0.1855, -0.1717],
          [ 0.0961, -0.2446, -0.0918],
          [-0.1925,  0.2286,  0.0260]]],


        [[[ 0.3091,  0.0959,  0.3065],
          [ 0.0555, -0.0527,  0.1545],
          [ 0.1176, -0.2485,  0.2863]]],


        [[[ 0.0796,  0.1389,  0.0098],
          [ 0.0660,  0.1612,  0.0292],
          [-0.2270,  0.2746,  0.2107]]]], device='cuda:0', requires_grad=True)), ('bias', Parameter containing:
tensor([-0.3192,  0.0184,  0.3248,  0.0720,  0.0273,  0.2885], device='cuda:0',
       requires_grad=True))]
缓冲区数据 []

通过结果可知，.named_parameters()方法，可以得到conv1模块的参数和偏置数据，同时缓冲区Buffer数据为空。

关于PyTorch中有关Buffer和Paramater的区别。一般来说，Torch模型中需要保存下来的参数包括两种:

一种是反向传播需要被optimizer更新的，称之为 parameter。
一种是反向传播不需要被optimizer更新，称之为 buffer。

第一种参数我们可以通过 model.parameters() 返回；第二种参数我们可以通过过 model.buffers() 返回。因为我们的模型保存的是 state_dict 返回的 OrderDict，所以这两种参数不仅要满足是否需要被更新的要求，还需要被保存到OrderDict。

3.模块剪枝（核心）

剪枝一个模块，需要三步：

step1.在torch.nn.utils.prune中选定一个剪枝方案，或者自定义(通过子类BasePruningMethod)
step2.指定需要剪枝的模块和对应的名称
step3.输入对应函数需要的参数

3.1 随机剪枝weight

这里示例一个非结构化剪枝方法，random_unstructured()，选定conv1模块，剪枝比例为30%。

# 这里，选用方案为随机非结构化剪枝module(conv1)中weight的参数，比例为30%
prune.random_unstructured(module,name='weight',amount=0.3)

'修剪的作用是将权重从参数中移除，并用一个名为weight_orig的新参数替换它(即在初始参数名称后面添加“_orig”)。weight_trans存储了张量的未剪枝的版本。bias没有被修剪，所以它会保持不变。我们看看现在module的weight变成啥样了。

print(list(module.named_parameters()))  # 输出剪枝后的权重参数

输出：

[('bias', Parameter containing:
tensor([-0.1190, -0.1459,  0.1585, -0.1844, -0.0692,  0.0761], device='cuda:0',
       requires_grad=True)), ('weight_orig', Parameter containing:
tensor([[[[ 0.1597,  0.1880, -0.2685],
          [ 0.2026,  0.2884, -0.1808],
          [ 0.0732,  0.0585, -0.0769]]],


        [[[ 0.0520, -0.2434, -0.1346],
          [-0.2128, -0.2137, -0.0478],
          [-0.2456, -0.2241,  0.1080]]],


        [[[-0.0738, -0.2010,  0.1235],
          [ 0.2351, -0.1867, -0.1614],
          [-0.2364, -0.1841,  0.0431]]],


        [[[-0.1626, -0.0424,  0.0527],
          [-0.2939, -0.0562, -0.0746],
          [ 0.2492,  0.1073,  0.0602]]],


        [[[-0.1094,  0.2420, -0.3171],
          [ 0.1193,  0.0303,  0.0832],
          [ 0.0308, -0.2415, -0.1136]]],


        [[[-0.3254,  0.0593, -0.2013],
          [ 0.1987,  0.1115,  0.1455],
          [-0.1936, -0.3215, -0.1646]]]], device='cuda:0', requires_grad=True))]

通过结果可知，原始的weight被weight_orig代替，bias保持不变

由上述选择的剪枝方案生成的剪枝掩码被保存为一个名为weight_mask的模块缓冲区(即在初始参数名称后面添加“_mask”)。

print(list(module.buffers()))

输出weight_mask：

[tensor([[[[1., 1., 0.],
          [0., 1., 1.],
          [0., 0., 1.]]],


        [[[1., 0., 1.],
          [0., 1., 1.],
          [1., 1., 1.]]],


        [[[1., 1., 1.],
          [1., 1., 1.],
          [1., 0., 1.]]],


        [[[1., 1., 0.],
          [0., 0., 1.],
          [0., 1., 1.]]],


        [[[0., 1., 1.],
          [0., 1., 1.],
          [1., 1., 1.]]],


        [[[1., 0., 0.],
          [1., 1., 1.],
          [1., 0., 1.]]]], device='cuda:0')]

通过结果可知，buffers多出来6×3×3的数据，其中mask中的0代表被剪枝，1代表未被剪枝。实际上就是mask与原始参数进行组合，然后保存在weight中，要注意此时它不再是模型的参数，而只是一个属性。

print(module.weight)

输出模型剪枝后的参数：

模型剪枝后权重参数： tensor([[[[ 0.1777, -0.0000, -0.0000],
          [ 0.2343,  0.2673,  0.1665],
          [-0.2993, -0.1947, -0.0000]]],


        [[[-0.2483,  0.1792, -0.1995],
          [-0.0000, -0.0000, -0.0000],
          [-0.1821,  0.0735,  0.0000]]],


        [[[-0.0000, -0.0000, -0.2375],
          [-0.1405,  0.0000, -0.0604],
          [-0.0660,  0.1085,  0.1807]]],


        [[[ 0.0000, -0.1220, -0.2022],
          [-0.2078,  0.0000,  0.0000],
          [-0.1470, -0.0000,  0.3173]]],


        [[[ 0.2120, -0.1476, -0.2939],
          [ 0.3090, -0.1572,  0.1311],
          [-0.2457,  0.0000,  0.1040]]],


        [[[-0.2145,  0.1023,  0.2987],
          [ 0.1153,  0.2309,  0.1024],
          [ 0.1326,  0.0000, -0.2388]]]], device='cuda:0',
       grad_fn=<MulBackward0>)

最后，查看._forward_pre_hooks，当模块被剪枝时，它将为被剪枝相关的参数获取一个forward_pre_hook。在本例中，由于到目前为止我们只删除了名为weight的原始参数，因此只会出现一个hock。

print(module._forward_pre_hooks)	# 只有一个hock，即weight

输出：

OrderedDict([(0, <torch.nn.utils.prune.RandomUnstructured object at 0x000001C6303E0640>)])

3.2 L1范数剪枝bias

为了完整起见，我们现在删除bias，看看模块的参数、缓冲区、hook和属性是如何变化的。刚使用的是随机剪枝，这里我们用L1范数剪枝bias中最小的1个值。

prune.l1_unstructured(module, name="bias", amount=1)
print(list(module.named_parameters()))

输出bias-L1：

bias-L1: [('weight_orig', Parameter containing:
tensor([[[[-0.1476,  0.0597,  0.1942],
          [ 0.1331, -0.0948, -0.2089],
          [-0.2600, -0.0888,  0.1752]]],


        [[[ 0.2840, -0.2354, -0.1865],
          [ 0.1032,  0.2911, -0.2829],
          [-0.1034, -0.1090,  0.2705]]],


        [[[ 0.2686,  0.2454, -0.2184],
          [-0.2400,  0.1100,  0.2278],
          [ 0.1445, -0.2764, -0.2458]]],


        [[[ 0.3074, -0.1116, -0.1135],
          [-0.2895, -0.0530, -0.1952],
          [-0.0451,  0.2353,  0.0073]]],


        [[[ 0.3321, -0.0071,  0.0327],
          [-0.1292,  0.3307,  0.0603],
          [ 0.0867, -0.1897,  0.2040]]],


        [[[ 0.0789,  0.0687,  0.3195],
          [ 0.1242, -0.1244, -0.2228],
          [-0.0605,  0.0980, -0.2067]]]], device='cuda:0', requires_grad=True)), ('bias_orig', Parameter containing:
tensor([ 0.0286,  0.1601,  0.0488,  0.1729, -0.0704, -0.1949], device='cuda:0',
       requires_grad=True))]

可以看到，weight替换为weight_orig，bias替换为bias_orig

print(list(module.named_buffers()))
print(module.bias)
print(module._forward_pre_hooks)

[('weight_mask', tensor([[[[1., 1., 0.],
          [0., 1., 1.],
          [1., 1., 0.]]],


        [[[0., 1., 1.],
          [1., 1., 1.],
          [0., 1., 1.]]],


        [[[1., 0., 1.],
          [1., 0., 1.],
          [1., 1., 0.]]],


        [[[1., 1., 1.],
          [1., 1., 1.],
          [0., 1., 0.]]],


        [[[0., 0., 1.],
          [1., 1., 0.],
          [1., 0., 0.]]],


        [[[1., 1., 1.],
          [1., 1., 1.],
          [1., 1., 0.]]]], device='cuda:0')), ('bias_mask', tensor([1., 1., 1., 0., 1., 1.], device='cuda:0'))]
tensor([-0.3009,  0.1505,  0.0685, -0.0000,  0.1766,  0.1367], device='cuda:0',
       grad_fn=<MulBackward0>)
OrderedDict([(0, <torch.nn.utils.prune.RandomUnstructured object at 0x0000020BF8250640>), (1, <torch.nn.utils.prune.L1Unstructured object at 0x0000020BF8250310>)])

4. 总结

本示例首先搭建了一个类LeNet网络模型，为了进行非结构化剪枝，我们选取了LeNet的conv1模块，该模块参数包含为6×3×3的weight卷积核参数和6×1的bias参数，通过示例，我们利用torch.nn.prune中的剪枝方法，实现了对weight参数进行30%随机非结构化剪枝，以及对bias的L1非结构化剪枝。

本文用到的核心函数方法：

module.named_parameters()，需转换为list对其可视化
module.buffers()，需转换为list对其可视化
module.weight，直接打印模块的weight参数
module.bias，直接打印模块的bias参数
prune.random_unstructured()，随机非结构化剪枝
prune.l1_unstructured()，L1非结构化剪枝

参考资料：Pytorch官方剪枝教程

码字神经元

关注

0
点赞
踩
7

收藏

觉得还不错? 一键收藏
打赏
0
评论
神经网络模型压缩&实例教程—非结构化剪枝

'''搭建类LeNet网络'''return x本示例首先搭建了一个类LeNet网络模型，为了进行非结构化剪枝，我们选取了LeNet的conv1模块，该模块参数包含为6×3×3的weight卷积核参数和6×1的bias参数，通过示例，我们利用torch.nn.prune中的剪枝方法，实现了对weight参数进行30%随机非结构化剪枝，以及对bias的L1非结构化剪枝。本文用到的核心函数module.named_parameters()，需转换为list对其可视化。
复制链接

扫一扫