一文搞定Pytorch保存和加载模型、pt转换ONNX

圆心不会飞

已于 2024-09-10 20:25:15 修改

阅读量1.3k

点赞数 8

文章标签： pytorch 深度学习人工智能

于 2024-09-10 18:05:24 首次发布

本文链接：https://blog.csdn.net/qq_43448134/article/details/142102340

版权

一、初探卷积神经网络

二、torch保存模型

2.1 torch保存完整模型（包括网络结构和权重参数）

一、初探卷积神经网络

定义一个简单的卷积神经网络：

import torch
import torch.nn as nn
import torch.nn.functional as F

class ConvTestNet(nn.Module):
    def __init__(self):
        super(CivilNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(256, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        self.gemfield = "gemfield.org"
        self.syszux = torch.zeros([1,1])

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 256)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# 初始化模型
model = ConvTestNet()
print("model type = ",type(model))
print(model)
print(dir(model))
for name, param in model.named_parameters():
        if 'weight' in name or 'bias' in name:
            print(f"{name}: {param.data.shape}")

输出结果：

# model类型
model type =  <class '__main__.ConvTestNet'>
# model的网络结构
ConvTestNet(
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)
# model 所有的属性
['T_destination', '__annotations__', '__call__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_apply', '_backward_hooks', '_backward_pre_hooks', '_buffers', '_call_impl', '_compiled_call_impl', '_forward_hooks', '_forward_hooks_always_called', '_forward_hooks_with_kwargs', '_forward_pre_hooks', '_forward_pre_hooks_with_kwargs', '_get_backward_hooks', '_get_backward_pre_hooks', '_get_name', '_is_full_backward_hook', '_load_from_state_dict', '_load_state_dict_post_hooks', '_load_state_dict_pre_hooks', '_maybe_warn_non_full_backward_hook', '_modules', '_named_members', '_non_persistent_buffers_set', '_parameters', '_register_load_state_dict_pre_hook', '_register_state_dict_hook', '_replicate_for_data_parallel', '_save_to_state_dict', '_slow_forward', '_state_dict_hooks', '_state_dict_pre_hooks', '_version', '_wrapped_call_impl', 'add_module', 'apply', 'bfloat16', 'buffers', 'call_super_init', 'children', 'compile', 'conv1', 'conv2', 'cpu', 'cuda', 'double', 'dump_patches', 'eval', 'extra_repr', 'fc1', 'fc2', 'fc3', 'float', 'forward', 'gemfield', 'get_buffer', 'get_extra_state', 'get_parameter', 'get_submodule', 'half', 'ipu', 'load_state_dict', 'modules', 'named_buffers', 'named_children', 'named_modules', 'named_parameters', 'parameters', 'pool', 'register_backward_hook', 'register_buffer', 'register_forward_hook', 'register_forward_pre_hook', 'register_full_backward_hook', 'register_full_backward_pre_hook', 'register_load_state_dict_post_hook', 'register_module', 'register_parameter', 'register_state_dict_pre_hook', 'requires_grad_', 'set_extra_state', 'share_memory', 'state_dict', 'syszux', 'to', 'to_empty', 'train', 'training', 'type', 'xpu', 'zero_grad']

# model中参数层名称及对应shape
conv1.weight: torch.Size([6, 3, 5, 5])
conv1.bias: torch.Size([6])
conv2.weight: torch.Size([16, 6, 5, 5])
conv2.bias: torch.Size([16])
fc1.weight: torch.Size([120, 400])
fc1.bias: torch.Size([120])
fc2.weight: torch.Size([84, 120])
fc2.bias: torch.Size([84])
fc3.weight: torch.Size([10, 84])
fc3.bias: torch.Size([10])

可以看出来，此时的model是一个自定义的ConvTestNet 类，其父类是torch.nn.Module （ PyTorch 中所有神经网络模块的基类），model对象此时有自身（conv1', 'conv2）和来自父类的属性（state_dict，named_parameters等）。

下面列举一些model对象常用的属性和方法：

_modules：类型：OrderedDict 包含该神经网络所有子模块（如卷积层、激活层、全连接层等）。可以用来访问和遍历网络的所有子模块。
_parameters：类型：OrderedDict 包含该神经网络的所有参数（如权重和偏置）。这些参数是由模块的构造函数（如nn.Linear、nn.Conv2d等）创建的。
_buffers：OrderedDict 用于存储非训练参数的缓冲区（例如 BatchNorm 中的运行均值和方差）。
training：bool 指示当前模块是否处于训练模式。可以通过 model.train() 或 model.eval() 来切换。
device：torch.device 表示该模块所在的设备（CPU或GPU）。可以使用 to() 方法将模块转移到特定设备。
__init__()：构造函数，用于定义网络层和初始化参数。在子类中通常需要重写此方法。
forward(input)：前向传播方法，定义数据如何在网络中流动。在子类中必须重写此方法。
train(mode=True)：将模块设置为训练模式。默认为 True，当设置为 False 时，切换到评估模式。
eval()：将模块设置为评估模式。与 train(False) 等效。通常在模型评估和推理时使用。
to(device)：将模型或张量移动到指定的设备（CPU 或 GPU）。
parameters()：返回模型中所有需要学习的参数（权重和偏置），通常用于优化器初始化。
named_parameters()：返回一个生成器，生成模型中的所有参数的名称和参数本身。
state_dict()：返回一个包含所有模块参数和缓冲区的字典。常用于模型的保存和加载。
load_state_dict(state_dict)：将模型的参数和缓冲区加载到给定的 state_dict 中。常用于恢复模型状态。
zero_grad()：将所有模型参数的梯度清零。通常在每个优化步骤之前调用，以防止累积梯度。
apply(fn)：递归地将函数 fn 应用于模型的每个子模块（包括自身）。
children()：返回一个生成器，生成模型的所有子模块。
modules()：返回一个生成器，生成模型的所有模块，包括模型本身及其所有子模块。

二、torch保存模型

2.1 torch保存完整模型（包括网络结构和权重参数）

def model_save_whole_test():
    model = ConvTestNet()
    torch.save(model, "./model_all.pt"  )
model_save_whole_test()

2.2 torch仅仅保存权重

def model_save_weight_test():
    model = ConvTestNet()
    torch.save( model.state_dict(), "./model_state_dict.pt")

model_save_weight_test()

三、torch加载模型

def model_load_test():
    print("***************whole: struct && weight**********************")
    m1 = torch.load("./model_whole.pt")
    print("type(m1)=", type(m1))
    for name,param in m1.named_parameters():
        print(f"name = {name}: param = {param.data.shape}")

    print("***************only weight**********************")
    m2 = torch.load( "./model_state_dict.pt" )
    print(" type(m2) =  ",type(m2))
    for k,i in m2.items():
        print(f"key = {k}, value = {i.shape}")
model_load_test()

输出结果：

***************whole: struct && weight**********************
type(m1)=<class '__main__.ConvTestNet'>
name = conv1.weight: param = torch.Size([6, 3, 5, 5])
name = conv1.bias: param = torch.Size([6])
name = conv2.weight: param = torch.Size([16, 6, 5, 5])
name = conv2.bias: param = torch.Size([16])
name = fc1.weight: param = torch.Size([120, 400])
name = fc1.bias: param = torch.Size([120])
name = fc2.weight: param = torch.Size([84, 120])
name = fc2.bias: param = torch.Size([84])
name = fc3.weight: param = torch.Size([10, 84])
name = fc3.bias: param = torch.Size([10])
***************only weight**********************
 type(m2) =   <class 'collections.OrderedDict'>
key = conv1.weight, value = torch.Size([6, 3, 5, 5])
key = conv1.bias, value = torch.Size([6])
key = conv2.weight, value = torch.Size([16, 6, 5, 5])
key = conv2.bias, value = torch.Size([16])
key = fc1.weight, value = torch.Size([120, 400])
key = fc1.bias, value = torch.Size([120])
key = fc2.weight, value = torch.Size([84, 120])
key = fc2.bias, value = torch.Size([84])
key = fc3.weight, value = torch.Size([10, 84])
key = fc3.bias, value = torch.Size([10])

结论：从上面可以看到，torch.load()加载whole模型时，model对象的type仍然是__main__.ConvTestNet 类型，包含了权重和模型结构，可以通过m1.named_parameters()可以得到模型的参数名称和对应权重参数数据。但torch.load 加载仅包含权重的pt文件时，m2的类型数据类型时<class 'collections.OrderedDict'>，即是一个有序字典，分别对应卷积神经网络中参数层的name和对应权重，而没有模型的结构信息。

三、torch pt文件转onnx

3.1 加载完整pt文件并转换onnx

def pt2onnx():
    model = torch.load("./model_all.pt")
    dummy_input = torch.randn(1, 3, 28, 28)  # 假设输入是 1x1x28x28 的张量
    torch.onnx.export(model,                       # 要转换的模型
                  dummy_input,                 # 模型的示例输入
                  "./model_all.onnx",                # 输出的 ONNX 文件名
                  export_params=True,          # 导出所有参数
                  opset_version=11,            # ONNX 的版本
                  do_constant_folding=True,    # 是否执行常量折叠优化
                  input_names=['input'],       # 输入名称
                  output_names=['output'],     # 输出名称
                  dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}})  # 动态轴配置

将保存的onnx模型进行可视化，发现转换成功了，符合预期，onnx模型中既包含网络结构也包括模型参数。

3.2 加载权重pt文件并转换

def pt2onnx_2():
    # model = ConvTestNet()
    model = torch.load("./model_state_dict.pt")
    dummy_input = torch.randn(1, 3, 28, 28)  # 假设输入是 1x1x28x28 的张量
    torch.onnx.export(model,                       # 要转换的模型
                  dummy_input,                 # 模型的示例输入
                  "./model_state_dict.onnx",                # 输出的 ONNX 文件名
                  export_params=True,          # 导出所有参数
                  opset_version=11,            # ONNX 的版本
                  do_constant_folding=True,    # 是否执行常量折叠优化
                  input_names=['input'],       # 输入名称
                  output_names=['output'],     # 输出名称
                  dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}})  # 动态轴配置

运行这个测试程序有如下报错：

将model = torch.load("./model_state_dict.pt") 修改成

model = ConvTestNet()
model.load_state_dict(torch.load("./model_state_dict.pt"),strict=True)
model.eval()

可以发现，model_state_dict.onnx 可以保存成功，其onnx模型和上面结果一样。

可以发现model = ConvTestNet() 就是定义了模型结构，加上model_state_dict.pt 中的参数权重信息，就可以正常转换onnx。

四、结论

1）torch两种模型保存优缺点比较

保存方式	优点	缺点
`torch.save(model, "./model_all.pt")`	简单易用，可以直接保存和加载完整模型，无需手动定义模型结构。	加载时要求模型类定义必须存在于当前环境中，文件体积较大。
`torch.save(model.state_dict(), "./model_state_dict.pt")`	文件体积较小，与模型结构解耦，更灵活，适合模型迁移和部署。	需要手动定义模型结构，然后加载参数，对于新手可能稍复杂。

2）pt转换onnx模型：模型结构定义必须可用。

如果你使用 torch.save(model.state_dict(), "./model_state_dict.pt") 只保存了模型的权重，那么在将其转换为 ONNX 格式之前，需要重新定义并实例化与保存的权重相匹配的模型结构。
如果你使用 torch.save(model, "./model_all.pt") 保存了整个模型（包括结构和权重），加载后可以直接使用模型对象进行转换。