【CLIP模型从.pt到.onnx】ValueError: Unsupported type for attn_mask: 5 已解决

最新推荐文章于 2024-08-01 20:59:33 发布

多恩Stone

最新推荐文章于 2024-08-01 20:59:33 发布

阅读量772

点赞数 21

分类专栏： AIGC 编程学习模型部署文章标签：人工智能 python pytorch AIGC

本文链接：https://blog.csdn.net/weixin_44212848/article/details/139062951

版权

编程学习同时被 3 个专栏收录

58 篇文章 2 订阅

订阅专栏

AIGC

57 篇文章 2 订阅

订阅专栏

模型部署

22 篇文章 0 订阅

订阅专栏

在深度学习模型设计阶段（写论文、进行研究时），由于模型结构和参数经常变化，最常用的模型权重格式包括 .pt、.pth 和 .ckpt 等。

.pt 和 .pth 是 PyTorch 中的标准扩展名，功能相同，用于保存模型的权重和结构。
.ckpt 是 PyTorch Lightning 框架中常用的扩展名，保存更详细的训练检查点信息，包括模型权重和训练状态。

而在实际应用阶段（生产环境、模型部署时），模型结构和超参数已确定，不再需要频繁更改。此时，通常需要将模型从以下格式转换：

.pt（权重）+ 模型定义的 .py 代码（确定的模型结构） = .onnx

CLIP 模型在多种算法中广泛使用，并且通常需要处理不同的输入类型。因此，根据输入的不同，需要分别转换模型。如下图所示，模型需分为 img 和 txt 两个部分，分别进行转换。
在这里插入图片描述
转化时的步骤：

确定原模型的输入和输出
确定转化模型的结构定义部分
通过 torch.onnx.export 进行转化
注意是否要设置动态维度

import torch
import torch.nn as nn
from typing import Union, List, Tuple
from functools import partial

# 定义处理图像数据的 CLIP 模型定义代码，从原算法中提出来即可
class ImgModelWrapper(nn.Module):

    def __init__(self,
                 clip_model_name: str,
                 download_root: str = None,
                 device: torch.device = "cuda" if torch.cuda.is_available() else "cpu",
                 jit: bool = False,
                 # additional params
                 visual_score: bool = False,
                 feats_loss_type: str = None,
                 feats_loss_weights: List[float] = None,
                 fc_loss_weight: float = None,
                 context_length: int = 77):
        super().__init__()

        import clip  # local import

        # check model info
        self.clip_model_name = clip_model_name
        self.device = device
        self.available_models = clip.available_models()
        assert clip_model_name in self.available_models, f"A model backbone: {clip_model_name} that does not exist"

        # load CLIP
        self.model, self.preprocess = clip.load(clip_model_name, device, jit=jit, download_root=download_root)
        self.model.eval()
    
    def forward(self, image):
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        image = image.to(device)
        image_features = self.model.encode_image(image)
        return image_features
        
# 定义处理文本数据的 CLIP 模型定义代码，从原算法中提出来即可
class TxtModelWrapper(nn.Module):

    def __init__(self,
                 clip_model_name: str,
                 download_root: str = None,
                 device: torch.device = "cuda" if torch.cuda.is_available() else "cpu",
                 jit: bool = False,
                 # additional params
                 visual_score: bool = False,
                 feats_loss_type: str = None,
                 feats_loss_weights: List[float] = None,
                 fc_loss_weight: float = None,
                 context_length: int = 77):
        super().__init__()

        import clip  # local import

        # check model info
        self.clip_model_name = clip_model_name
        self.device = device
        self.available_models = clip.available_models()
        assert clip_model_name in self.available_models, f"A model backbone: {clip_model_name} that does not exist"

        # load CLIP
        self.model, self.preprocess = clip.load(clip_model_name, device, jit=jit, download_root=download_root)
        self.model.eval()

        # load tokenize
        self.tokenize_fn = partial(clip.tokenize, context_length=context_length)
    
    def forward(self, text, norm: bool = True):
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        tokens = self.tokenize_fn(text).to(device)
        # print('tokens', tokens.shape) # torch.Size([79, 77])
        txt_features = self.model.encode_text(tokens)
        # if norm:
        #     text_features = txt_features.mean(axis=0, keepdim=True)
        #     text_features_norm = text_features / text_features.norm(dim=-1, keepdim=True)
        #     return text_features_norm
        return txt_features

device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # 如果GPU设备可以，则用GPU；否则用CPU设备

clip_img = ImgModelWrapper("ViT-B/32", device= 'cuda')
clip_txt = TxtModelWrapper("ViT-B/32", device= 'cuda')

# 原算法的输入尺寸进行调整
x_img = torch.randn(1, 3, 224, 224) # 0, 1, 2, 3 中 0, 2, 3 都是动态的
 
with torch.no_grad(): 
    torch.onnx.export(
        clip_img, 
        x_img, 
        "clip_img.onnx", 
        opset_version=17, 
        input_names=['input'], 
        output_names=['output'],
        dynamic_axes={'input' : {0 : 'batch_size',
                                 2 : 'height',
                                 3 : 'width'},
                      'output' : {0 : 'batch_size',
                                  2 : 'height',
                                  3 : 'width'}})

x_txt = ['按照原算法的输入尺寸进行调整...']

with torch.no_grad(): 
    torch.onnx.export(
        clip_txt, 
        x_txt, 
        "clip_txt.onnx", 
        opset_version=17, 
        input_names=['input'], 
        output_names=['output'],
        dynamic_axes={'input' : {0 : 'batch_size',
                                 1 : 'seq'},
                      'output' : {0 : 'batch_size',
                                  1 : 'seq'}})

在对文本 CLIP 进行转化时，出现以下报错

  File "/path/to/lib/python3.10/site-packages/torch/onnx/symbolic_opset14.py", line 197, in scaled_dot_product_attention
    raise ValueError(
ValueError: Unsupported type for attn_mask: 5

可通过一下几种思路进行问题排查：

检查并确认模型和输入数据的正确性。
✅ 更新 PyTorch 和 ONNX 版本。
确认并转换 attn_mask 为支持的类型。
如果必要，自定义符号函数。

最终通过更新 PyTorch 和 ONNX 版本解决，可能是老版本的 symbolic_opset14 算子不支持 attn_mask 导致的。

更新 PyTorch 和 ONNX 版本代码如下

pip install --upgrade torch torchvision
pip install --upgrade onnx

最终成功得到两个 .onnx 模型！🎉
在这里插入图片描述

参考博客：
5. 【保姆级教程附代码】Pytorch (.pth) 到 TensorRT (.plan) 模型转化全流程
6. CLIP模型导出ONNX模型

多恩Stone

关注

21
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
【CLIP模型从.pt到.onnx】ValueError: Unsupported type for attn_mask: 5 已解决

在深度学习模型设计阶段（写论文、进行研究时），由于模型结构和参数经常变化，最常用的模型权重格式包括 .pt、.pth 和 .ckpt 等。
复制链接

扫一扫

专栏目录