【PyTorch】torch.quantization.quantize_dynamic() 函数：动态量化（Dynamic Quantization）

最新推荐文章于 2025-09-15 01:18:48 发布

原创最新推荐文章于 2025-09-15 01:18:48 发布 · 981 阅读

10 ·

CC 4.0 BY-SA版权

文章标签：

#pytorch #python #quantize_dynami #quantization #动态量化 #深度学习

PyTorch基础专栏收录该内容

117 篇文章

订阅专栏

该文章已生成可运行项目，

`torch.quantization.quantize_dynamic` 函数

torch.quantization.quantize_dynamic 是 PyTorch 提供的动态量化（Dynamic Quantization）函数，用于 减少模型大小 并 加速推理，特别适用于 CPU 设备。

1. 作用

仅对 权重（Weights） 进行量化，而 激活（Activations）仍然保持 FP32。
适用于 Transformer、LSTM、GRU、全连接网络（Linear layers）。
量化后，模型计算更快，占用更少的存储空间。

2. 语法

torch.quantization.quantize_dynamic(
    model,  # 需要量化的模型
    qconfig_spec=None,  # 需要量化的层（默认所有支持的层）
    dtype=torch.qint8,  # 量化数据类型（默认 INT8）
    inplace=False  # 是否在原模型上修改
)

参数	说明
`model`	需要量化的模型
`qconfig_spec`	需要量化的层（如 `nn.Linear`, `nn.LSTM`），默认所有支持的层
`dtype`	目标数据类型（默认 `torch.qint8`）
`inplace`	是否在原模型上修改（默认 `False`，返回新模型）

3. 示例：对 `nn.Linear` 层进行动态量化

import torch
import torch.nn as nn
import torch.quantization

# 定义一个简单的全连接网络
class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.fc1 = nn.Linear(128, 64)
        self.fc2 = nn.Linear(64, 10)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# 创建模型
model = SimpleModel()

# 对全连接层进行动态量化
quantized_model = torch.quantization.quantize_dynamic(
    model, {nn.Linear}, dtype=torch.qint8
)

print(quantized_model)

输出

SimpleModel(
  (fc1): DynamicQuantizedLinear(in_features=128, out_features=64, dtype=torch.qint8)
  (fc2): DynamicQuantizedLinear(in_features=64, out_features=10, dtype=torch.qint8)
)

解析

fc1 和 fc2 被量化为 DynamicQuantizedLinear，使用 INT8 存储权重。
计算时，输入仍然是 FP32，但权重使用 INT8 计算，提升推理速度。

4. 示例：对 LSTM 进行动态量化

class LSTMModel(nn.Module):
    def __init__(self):
        super(LSTMModel, self).__init__()
        self.lstm = nn.LSTM(input_size=128, hidden_size=64, num_layers=1, batch_first=True)
        self.fc = nn.Linear(64, 10)

    def forward(self, x):
        x, _ = self.lstm(x)
        x = self.fc(x[:, -1, :])
        return x

# 创建 LSTM 模型
model = LSTMModel()

# 进行动态量化
quantized_model = torch.quantization.quantize_dynamic(
    model, {nn.LSTM, nn.Linear}, dtype=torch.qint8
)

print(quantized_model)

解析

nn.LSTM 和 nn.Linear 被量化，提高计算效率。
适用于 NLP 任务，如 Transformer、LSTM、BERT 推理优化。

5. 动态量化的优缺点

优点	缺点
减少存储（权重从 FP32 → INT8，压缩 4 倍）	计算仍然部分使用 FP32
提高 CPU 推理速度	仅适用于 `nn.Linear`, `nn.LSTM`, `nn.GRU`
无需标定数据	精度略低于静态量化