（10-9）大模型优化算法和技术：剪枝优化技术

最新推荐文章于 2024-04-25 20:01:18 发布

码农三叔

最新推荐文章于 2024-04-25 20:01:18 发布

阅读量1.4k

点赞数 18

分类专栏：大模型从入门到实战文章标签：算法剪枝机器学习

本文链接：https://blog.csdn.net/asd343442/article/details/135813176

版权

大模型从入门到实战专栏收录该内容

169 篇文章 46 订阅

订阅专栏

10.7.6 剪枝优化技术

剪枝（Pruning）是一种通过移除神经网络中不必要的连接、神经元或层来减小模型的大小和计算量的技术。剪枝可以分为结构化剪枝和非结构化剪枝两种类型。结构化剪枝是指移除整个过滤器、通道或层，而非结构化剪枝则是针对单个参数或神经元进行剪枝。剪枝可以通过不断迭代训练和剪枝来实现，通常剪枝后需要进行微调，以保持模型性能。

1. TensorFlow剪枝优化

TensorFlow提供了实现剪枝处理的API，使我们可以通过减少权重参数数量来精简模型，从而在不牺牲太多性能的情况下减小模型的存储需求和计算开销。例如下面是一个使用TensorFlow Pruning API的例子。

实例10-1：使用TensorFlow对神经网络模型进行剪枝操作（源码路径：daima/10/jian.py）

实例文件jian.py的具体实现流程如下所示。

（1）使用TensorFlow Model Optimization库对模型进行剪枝和压缩操作：

import tensorflow as tf
import numpy as np
import tensorflow_model_optimization as tfmot

%load_ext tensorboard

import tempfile

input_shape = [20]
x_train = np.random.randn(1, 20).astype(np.float32)
y_train = tf.keras.utils.to_categorical(np.random.randn(1), num_classes=20)

def setup_model():
  model = tf.keras.Sequential([
      tf.keras.layers.Dense(20, input_shape=input_shape),
      tf.keras.layers.Flatten()
  ])
  return model

def setup_pretrained_weights():
  model = setup_model()

  model.compile(
      loss=tf.keras.losses.categorical_crossentropy,
      optimizer='adam',
      metrics=['accuracy']
  )

  model.fit(x_train, y_train)

  _, pretrained_weights = tempfile.mkstemp('.tf')

  model.save_weights(pretrained_weights)

  return pretrained_weights

def get_gzipped_model_size(model):
  # Returns size of gzipped model, in bytes.
  import os
  import zipfile

  _, keras_file = tempfile.mkstemp('.h5')
  model.save(keras_file, include_optimizer=False)

  _, zipped_file = tempfile.mkstemp('.zip')
  with zipfile.ZipFile(zipped_file, 'w', compression=zipfile.ZIP_DEFLATED) as f:
    f.write(keras_file)

  return os.path.getsize(zipped_file)

setup_model()
pretrained_weights = setup_pretrained_weights()

（2）定义模型

剪枝整个模型（顺序模型和函数式API），提高模型准确性的提示：

尝试“剪枝一些层”，跳过那些最影响准确性的层。
通常情况下，与从头开始训练相比，使用微调的方式进行剪枝会更好。

要使整个模型在剪枝的情况下进行训练，请将 tfmot.sparsity.keras.prune_low_magnitude 应用于模型：

base_model = setup_model()
base_model.load_weights(pretrained_weights) # optional but recommended.
model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(base_model)
model_for_pruning.summary()

此时执行后会输出：

Layer (type)                Output Shape              Param #   
=================================================================
 prune_low_magnitude_dense_  (None, 20)                822       
 2 (PruneLowMagnitude)                                           
                                                                 
 prune_low_magnitude_flatte  (None, 20)                1         
 n_2 (PruneLowMagnitude)                                         
                                                                 
=================================================================
Total params: 823 (3.22 KB)
Trainable params: 420 (1.64 KB)
Non-trainable params: 403 (1.58 KB)
_________________________________________________________________

（3）剪枝部分层（顺序模型和函数式API）

对模型进行剪枝可能会对准确性产生负面影响，为此我们可以有选择地剪枝模型的层，以在准确性、速度和模型大小之间探索权衡。通常而言，与从头开始训练相比，使用微调的方式进行剪枝会更好。应该尽量尝试剪枝后面的层，而不是前面的层。另外，还需要避免剪枝关键层（例如注意力机制）。在下面的代码中，只对Dense层进行剪枝。

# 创建一个基本模型
base_model = setup_model()
base_model.load_weights(pretrained_weights)  # 可选，但推荐以提高模型准确性

# 辅助函数使用 `prune_low_magnitude` 仅对 Dense 层应用剪枝训练。
def apply_pruning_to_dense(layer):
    if isinstance(layer, tf.keras.layers.Dense):
        return tfmot.sparsity.keras.prune_low_magnitude(layer)
    return layer

# 使用 `tf.keras.models.clone_model` 应用 `apply_pruning_to_dense` 
# 到模型的各层。
model_for_pruning = tf.keras.models.clone_model(
    base_model,
    clone_function=apply_pruning_to_dense,
)

model_for_pruning.summary()

这样将得到一个仅在 Dense 层应用剪枝的模型 model_for_pruning。这有助于探索在模型准确性、速度和模型大小之间的权衡。此时执行后会输出：

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 prune_low_magnitude_dense_  (None, 20)                822       
 3 (PruneLowMagnitude)                                           
                                                                 
 flatten_3 (Flatten)         (None, 20)                0         
                                                                 
=================================================================
Total params: 822 (3.21 KB)
Trainable params: 420 (1.64 KB)
Non-trainable params: 402 (1.57 KB)
_________________________________________________________________

虽然此示例使用层的类型来决定要剪枝的内容，但剪枝特定层的最简单方法是设置其名称属性，然后在 clone_function 中查找该名称。此时执行后会输出：

dense_3

此时的代码虽然更易读，但是可能降低模型准确性，这与使用剪枝进行微调不兼容，这就是为什么它可能比上面支持微调的示例准确性较低。虽然 prune_low_magnitude 可以在定义初始模型时应用，但在之后加载权重是不适用于下面的示例的。

i = tf.keras.Input(shape=(20,))
x = tfmot.sparsity.keras.prune_low_magnitude(tf.keras.layers.Dense(10))(i)
o = tf.keras.layers.Flatten()(x)
model_for_pruning = tf.keras.Model(inputs=i, outputs=o)
model_for_pruning.summary()

此时执行后会输出：

Model: "model"
_________________________________________________________________

 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 20)]              0         
                                                              
 prune_low_magnitude_dense_  (None, 10)                412       
 4 (PruneLowMagnitude)                                           
                                                                
 flatten_4 (Flatten)         (None, 10)                0                                                                       
=================================================================

Total params: 412 (1.61 KB)
Trainable params: 210 (840.00 Byte)
Non-trainable params: 202 (812.00 Byte)
_________________________________________________________________

下面是是函数式API的示例代码：

model_for_pruning = tf.keras.Sequential([
  tfmot.sparsity.keras.prune_low_magnitude(tf.keras.layers.Dense(20, input_shape=input_shape)),
  tf.keras.layers.Flatten()
])


model_for_pruning.summary()

此时执行后会输出：

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 prune_low_magnitude_dense_  (None, 20)                822       
 5 (PruneLowMagnitude)                                           
                                                                 
 flatten_5 (Flatten)         (None, 20)                0         
                                                                 
=================================================================
Total params: 822 (3.21 KB)
Trainable params: 420 (1.64 KB)
Non-trainable params: 402 (1.57 KB)
_________________________________________________________________

（4）剪枝自定义Keras层或修改层的部分以进行剪枝

常见的错误是剪枝偏置通常会严重损害模型的准确性，tfmot.sparsity.keras.PrunableLayer 适用于两种情形：

剪枝自定义Keras层
修改内置Keras层的部分以进行剪枝。

例如，在默认情况下，API仅剪枝Dense层的内核。下面的示例还会剪枝偏置。

Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 prune_low_magnitude_my_den  (None, 20)                843       
 se_layer (PruneLowMagnitud                                      
 e)                                                              
                                                                 
 flatten_6 (Flatten)         (None, 20)                0         
                                                                 
=================================================================
Total params: 843 (3.30 KB)
Trainable params: 420 (1.64 KB)
Non-trainable params: 423 (1.66 KB)
_________________________________________________________________

（5）训练模型

使用 Model.fit()训练模型，为了帮助调试训练过程，在训练过程中调用 tfmot.sparsity.keras.UpdatePruningStep 回调函数。

#定义模型.
base_model = setup_model()
base_model.load_weights(pretrained_weights) # optional but recommended for model accuracy
model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(base_model)

log_dir = tempfile.mkdtemp()
callbacks = [
    tfmot.sparsity.keras.UpdatePruningStep(),
    # Log sparsity and other metrics in Tensorboard.
    tfmot.sparsity.keras.PruningSummaries(log_dir=log_dir)
]

model_for_pruning.compile(
      loss=tf.keras.losses.categorical_crossentropy,
      optimizer='adam',
      metrics=['accuracy']
)

model_for_pruning.fit(
    x_train,
    y_train,
    callbacks=callbacks,
    epochs=2,
)

（6）自定义训练循环

为了帮助调试训练过程，在训练过程中调用 tfmot.sparsity.keras.UpdatePruningStep 回调函数。

# 定义模型。
base_model = setup_model()
base_model.load_weights(pretrained_weights)  # 可选，但推荐以提高模型准确性
model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(base_model)

# 常规设置
loss = tf.keras.losses.categorical_crossentropy
optimizer = tf.keras.optimizers.Adam()
log_dir = tempfile.mkdtemp()
unused_arg = -1
epochs = 2
batches = 1  # 示例中硬编码，批次数量无法更改。

# 非常规设置。
model_for_pruning.optimizer = optimizer
step_callback = tfmot.sparsity.keras.UpdatePruningStep()
step_callback.set_model(model_for_pruning)
log_callback = tfmot.sparsity.keras.PruningSummaries(log_dir=log_dir)  # 在Tensorboard中记录稀疏性和其他指标。
log_callback.set_model(model_for_pruning)

step_callback.on_train_begin()  # 运行剪枝回调
for _ in range(epochs):
    log_callback.on_epoch_begin(epoch=unused_arg)  # 运行剪枝回调
    for _ in range(batches):
        step_callback.on_train_batch_begin(batch=unused_arg)  # 运行剪枝回调

        with tf.GradientTape() as tape:
            logits = model_for_pruning(x_train, training=True)
            loss_value = loss(y_train, logits)
            grads = tape.gradient(loss_value, model_for_pruning.trainable_variables)
            optimizer.apply_gradients(zip(grads, model_for_pruning.trainable_variables))

    step_callback.on_epoch_end(batch=unused_arg)  # 运行剪枝回调

上述代码的功能是在训练过程中使用剪枝技术。它首先定义了一个基础模型，加载了预训练的权重，然后对模型应用低幅度剪枝。接着设置了常规参数，例如损失函数和优化器。随后，它通过运行剪枝回调来配置剪枝步骤。在训练循环中，它运行了多个剪枝回调来控制模型的剪枝进程。最后，使用TensorBoard来可视化剪枝过程中的稀疏性和其他指标。

为了提高剪枝模型的准确性，首先，查看 tfmot.sparsity.keras.prune_low_magnitude API 文档，以了解剪枝计划（pruning schedule）的概念和每种类型的剪枝计划的数学原理。

注意：

在模型剪枝时，选择一个既不过高也不过低的学习率。将剪枝计划视为一个超参数。
作为快速测试，尝试在训练开始时使用 tfmot.sparsity.keras.ConstantSparsity 计划，并将 begin_step 设置为 0，将模型剪枝到最终稀疏度。也许您能够幸运地获得不错的结果。
不要过于频繁地进行剪枝，以便模型有时间进行恢复。剪枝计划提供了一个合理的默认频率。

（7）检查点和反序列化

在检查点期间，必须保留优化器的步骤。这意味着虽然我们可以使用Keras HDF5模型进行检查点，但不能使用Keras HDF5权重。

# 定义模型。
base_model = setup_model()
base_model.load_weights(pretrained_weights)  # 可选，但推荐以提高模型准确性
model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(base_model)
_, keras_model_file = tempfile.mkstemp('.h5')


# 检查点：保存优化器是必要的（include_optimizer=True 是默认选项）。
model_for_pruning.save(keras_model_file, include_optimizer=True)

上述代码的功能是定义一个模型，并将其进行剪枝。然后，它创建一个临时的HDF5文件（.h5 格式），并将剪枝后的模型及其优化器保存在该文件中。在检查点期间，保存优化器状态对于恢复模型训练至关重要。

下面的代码仅适用于HDF5模型格式（不适用于HDF5权重和其他格式）：

with tfmot.sparsity.keras.prune_scope():
  loaded_model = tf.keras.models.load_model(keras_model_file)

loaded_model.summary()

此时执行后会输出：

Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 prune_low_magnitude_dense_  (None, 20)                822       
 6 (PruneLowMagnitude)                                           
                                                                 
 prune_low_magnitude_flatte  (None, 20)                1         
 n_7 (PruneLowMagnitude)                                         
                                                                 
=================================================================
Total params: 823 (3.22 KB)
Trainable params: 420 (1.64 KB)
Non-trainable params: 403 (1.58 KB)
_________________________________________________________________

（8）部署剪枝模型

使用大小压缩导出模型，定义一个模型，将其剪枝，并展示剪枝后的模型进行大小压缩的效果。首先，模型被剪枝，然后剥离剪枝信息以便导出。接着，它显示了去剪枝后的模型的摘要信息，并比较了未去剪枝和去剪枝模型的压缩大小。这有助于展示剪枝对模型大小的压缩效益。

# 定义模型。
base_model = setup_model()
base_model.load_weights(pretrained_weights)  # 可选，但推荐以提高模型准确性
model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(base_model)

# 通常在此处训练模型。

model_for_export = tfmot.sparsity.keras.strip_pruning(model_for_pruning)

print("final model")
model_for_export.summary()

print("\n")
print("Size of gzipped pruned model without stripping: %.2f bytes" % (get_gzipped_model_size(model_for_pruning)))

print("Size of gzipped pruned model with stripping: %.2f bytes" % (get_gzipped_model_size(model_for_export)))

此时执行后会输出：

Model: "sequential_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_7 (Dense)             (None, 20)                420       
                                                                 
 flatten_8 (Flatten)         (None, 20)                0         
                                                                 
=================================================================
Total params: 420 (1.64 KB)
Trainable params: 420 (1.64 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
Size of gzipped pruned model without stripping: 3498.00 bytes
WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
Size of gzipped pruned model with stripping: 2958.00 bytes

（9）硬件特定的优化

一旦不同的后端启用剪枝以改善延迟，使用块稀疏性可以提高特定硬件的延迟性能。增加块大小会降低能够在目标模型准确性下实现的峰值稀疏度，尽管如此，仍然可以提高延迟性能。

base_model = setup_model()


# 对于使用128位寄存器和8位量化权重的CPU，使用1x16的块大小很不错，
# 因为块大小恰好适合寄存器。
pruning_params = {'block_size': [1, 16]}
model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(base_model, **pruning_params)

model_for_pruning.summary()

上述代码块的功能是定义一个模型，并在使用特定硬件进行优化时，使用块稀疏性进行剪枝。在此示例中，使用1x16的块大小，以适应128位寄存器和8位量化权重的CPU。它将展示剪枝后的模型的摘要信息。此时执行后会输出：

Model: "sequential_8"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 prune_low_magnitude_dense_  (None, 20)                822       
 8 (PruneLowMagnitude)                                           
                                                                 
 prune_low_magnitude_flatte  (None, 20)                1         
 n_9 (PruneLowMagnitude)                                         
                                                                 
=================================================================
Total params: 823 (3.22 KB)
Trainable params: 420 (1.64 KB)
Non-trainable params: 403 (1.58 KB)
_________________________________________________________________

2. PyTorch剪枝优化

PyTorch提供了一些剪枝（Pruning）API，用于实现模型的剪枝优化。下面是一些常用的PyTorch剪枝API及其具体说明：

torch.nn.utils.prune.l1_unstructured(module, name, amount)：对模块中指定的权重进行L1正则化剪枝。module是要剪枝的模块，name是要剪枝的参数名称，amount是剪枝的比例。
torch.nn.utils.prune.random_unstructured(module, name, amount)：对模块中指定的权重进行随机剪枝。module是要剪枝的模块，name是要剪枝的参数名称，amount是剪枝的比例。
torch.nn.utils.prune.global_unstructured(parameters, pruning_method, amount)：对一组参数进行全局剪枝。parameters是要剪枝的参数列表，pruning_method是剪枝方法，amount是剪枝的比例。
torch.nn.utils.prune.remove(module, name)：从模块中移除剪枝参数，将剪枝的效果应用到权重上。
torch.nn.utils.prune.custom_from_mask(module, name)：根据自定义的掩码进行剪枝。

上述API可以用于不同的剪枝策略和需求，通过选择合适的剪枝方法和参数，可以实现对模型权重的剪枝，从而减少模型的大小和计算量。

实例10-2：使用PyTorch对神经网络模型进行剪枝操作（源码路径：daima/10/pyjian.py）

实例文件pyjian.py的具体实现代码如下所示。

import torch
import torch.nn as nn
import torch.nn.utils.prune as prune

# 定义一个简单的神经网络模型
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(-1, 784)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# 创建模型实例
model = Net()

# 打印原始模型结构
print(model)

# 在模型中添加剪枝操作
parameters_to_prune = (
    (model.fc1, 'weight'),
    (model.fc2, 'weight'),
    (model.fc3, 'weight')
)

prune.global_unstructured(
    parameters_to_prune,
    pruning_method=prune.L1Unstructured,
    amount=0.2
)

# 打印剪枝后的模型结构
print(model)

在上述代码中，首先定义了一个简单的神经网络模型。然后，我们通过使用torch.nn.utils.prune模块中的prune.global_unstructured函数在模型的每个线性层上进行剪枝操作。我们指定了要剪枝的参数，选择了剪枝方法为L1正则化，并指定了剪枝比例为20%。执行后会输出：

Net(
  (fc1): Linear(in_features=784, out_features=256, bias=True)
  (fc2): Linear(in_features=256, out_features=128, bias=True)
  (fc3): Linear(in_features=128, out_features=10, bias=True)
)
Net(
  (fc1): Linear(in_features=784, out_features=256, bias=True)
  (fc2): Linear(in_features=256, out_features=128, bias=True)
  (fc3): Linear(in_features=128, out_features=10, bias=True)
)

上面的输出结果打印了剪枝前后的模型结构，可以看到剪枝后的模型结构与剪枝前相同，因为这里只是在模型中添加了剪枝操作，并没有实际地执行剪枝操作。

注意：剪枝后的模型在推断阶段可以更加高效，但在训练阶段需要进行剪枝和调整。实际应用中，需要根据模型和数据集的特点进行适当的剪枝策略和调优。

码农三叔

关注

18
点赞
踩
18

收藏

觉得还不错? 一键收藏
打赏
0
评论
（10-9）大模型优化算法和技术：剪枝优化技术

在训练循环中，它运行了多个剪枝回调来控制模型的剪枝进程。使用大小压缩导出模型，定义一个模型，将其剪枝，并展示剪枝后的模型进行大小压缩的效果。为了提高剪枝模型的准确性，首先，查看 tfmot.sparsity.keras.prune_low_magnitude API 文档，以了解剪枝计划（pruning schedule）的概念和每种类型的剪枝计划的数学原理。上面的输出结果打印了剪枝前后的模型结构，可以看到剪枝后的模型结构与剪枝前相同，因为这里只是在模型中添加了剪枝操作，并没有实际地执行剪枝操作。
复制链接

扫一扫