目录
0. 先看结果,已更新
------------原模型--------------
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 16, 62, 62] 448
ReLU-2 [-1, 16, 62, 62] 0
Conv2d-3 [-1, 32, 60, 60] 4,640
ReLU-4 [-1, 32, 60, 60] 0
Conv2d-5 [-1, 64, 58, 58] 18,496
ReLU-6 [-1, 64, 58, 58] 0
AdaptiveAvgPool2d-7 [-1, 64, 5, 5] 0
Linear-8 [-1, 10] 16,010
================================================================
Total params: 39,594
Trainable params: 39,594
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 5.99
Params size (MB): 0.15
Estimated Total Size (MB): 6.19
----------------------------------------------------------------
------------原模型精度--------------
Test set: Average loss: 0.0135, Accuracy: 7048/10000 (70%)
-----------压缩模型--------------
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 16, 62, 62] 448
ReLU-2 [-1, 16, 62, 62] 0
Conv2d-3 [-1, 8, 62, 62] 128
Conv2d-4 [-1, 12, 60, 60] 864
Conv2d-5 [-1, 32, 60, 60] 416
ReLU-6 [-1, 32, 60, 60] 0
Conv2d-7 [-1, 18, 60, 60] 576
Conv2d-8 [-1, 24, 58, 58] 3,888
Conv2d-9 [-1, 64, 58, 58] 1,600
ReLU-10 [-1, 64, 58, 58] 0
AdaptiveAvgPool2d-11 [-1, 64, 5, 5] 0
Linear-12 [-1, 5] 8,000
Linear-13 [-1, 10] 60
================================================================
Total params: 15,980
Trainable params: 15,980
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 7.67
Params size (MB): 0.06
Estimated Total Size (MB): 7.78
----------------------------------------------------------------
-----------压缩模型精度--------------
Test set: Average loss: 0.0268, Accuracy: 4586/10000 (46%)
-----------微调验证精度--------------
Test set: Average loss: 0.0139, Accuracy: 6901/10000 (69%)
-----------模型大小对比--------------
可以看出,在基本不改变最终模型精度的情况下,模型由158KB压缩到了68KB,体积仅原来的43%。
当然对于很多精心设计的网络,该方法未必有效。
1. 背景
我们在训练神经网络的时候,网络的参数大都是凭经验设计的,其实里面往往有很多的参数是冗余的,我们可以将权重分解提取其中的主要成分以压缩模型。
2. SVD分解(nn.Linear)
线性层的分解十分简单,对于mxn的矩阵,通过svd分解我们将得到mxm, mxn, nxn三个矩阵,当m!=n时,pytorch对分解后的矩阵做了简化,对于5x4的矩阵分解后有:
# 由于对角阵的最后一行全为0,因此直接去掉了U矩阵的最后一列
U.shape = torch.Size([5, 4])
# 对角阵转为了一维,后面需要还原
S.shape = torch.Size([4])
V.shape = torch.Size([4, 4])
得到S, U, V矩阵后,选择前几位(l)对矩阵计算结果影响权重更高元素,实现分解,实验中l=1时是不可行的,精度无法恢复,选择l=5可以较好的恢复精度。随后根据S, U, V的参数生成新的线性层即可。具体原理看链接1。
import torch
import numpy as np
from torch import nn
def compress_linear(model, l=1):
U, S, V = torch.svd(model.weight)
# l = model.weight.shape[0] // l
U1 = U[:, :l]
S1 = S[:l]
V1 = V[:, :l]
V2 = torch.mm(torch.diag(S1), V1.T)
new_model = nn.Sequential(nn.Linear(V2.shape[0], V2.shape[1], bias=False),
nn.Linear(U1.shape[0], U1.shape[1], bias=True))
ne