【Flux TensorRT 模型部署踩坑（一）】onnx_export 模型输入乱序，TypeError: takes 3 positional arguments but 4 were given

最新推荐文章于 2024-09-16 21:15:16 发布

多恩Stone

最新推荐文章于 2024-09-16 21:15:16 发布

阅读量161

点赞数 1

分类专栏：模型部署文章标签：人工智能 python AIGC 深度学习

本文链接：https://blog.csdn.net/weixin_44212848/article/details/142304146

版权

模型部署专栏收录该内容

24 篇文章 0 订阅

订阅专栏

模型部署系列

问题描述

onnx_export 过程中，一直出现有 forward() 中的输入变量数量不对的问题

TypeError: CombinedTimestepTextProjEmbeddings.forward() takes 3 positional arguments but 4 were given

    onnx_export(
        pipeline.transformer,
        model_args=(
            torch.randn(1, 1024, 64).to(device=device, dtype=dtype), # torch.Size([1, 4096, 64]) latents = latents.reshape(batch_size, (height // 2) * (width // 2), num_channels_latents * 4)
            torch.randn(1).to(device=device, dtype=dtype),
            # None,
            torch.randn(1, text_hidden_size).to(device=device, dtype=dtype), # pooled_prompt_embeds torch.Size([1, 768])
            torch.randn(1, 512, 4096).to(device=device, dtype=dtype), # prompt_embeds torch.Size([1, 512, 4096])
            torch.randn(1, 512, 3).to(device=device, dtype=dtype),# txt_ids=text_ids, torch.Size([1, 512, 3])
            torch.randn(1, 1024, 3).to(device=device, dtype=dtype),# img_ids=latent_image_ids, torch.Size([1, 1024, 3])
            # None,
            False,
        ),
        output_path=transformer_path,
        ordered_input_names=["sample", "timestep", "guidance", "pooled_prompt_embeds", "prompt_embeds", "text_ids", "latent_image_ids", "joint_attention_kwargs","return_dict"],

问题定位

通过往前回溯排查，发现 onnx_export 时输入的 model_args 和实际运行 pipeline.transformer 的输入没有对齐。

原因以及解决方案

通过在 onnx_export 时将断点打到调用的 pipeline.transformer 模型 forward 开头位置，发现是与此处的输入变量一一对应，而非调用 pipeline.transformer 处的输入顺序。

（1）✅pipeline.transformer 原模型的输入顺序如下：
以 diffusers 的 flux 为例，/path/diffusers/models/transformers/transformer_flux.py

    def forward(
        self,
        hidden_states: torch.Tensor,
        encoder_hidden_states: torch.Tensor = None,
        pooled_projections: torch.Tensor = None,
        timestep: torch.LongTensor = None,
        img_ids: torch.Tensor = None,
        txt_ids: torch.Tensor = None,
        guidance: torch.Tensor = None,
        joint_attention_kwargs: Optional[Dict[str, Any]] = None,
        return_dict: bool = True,
    ) -> Union[torch.FloatTensor, Transformer2DModelOutput]:

（2）❌之前搞错的顺序，使用了调用该模型时的输入顺序：
/path/diffusers/pipelines/flux/pipeline_flux.py

      noise_pred = self.transformer(
          hidden_states=latents,
          # YiYi notes: divide it by 1000 for now because we scale it by 1000 in the transforme rmodel (we should not keep it but I want to keep the inputs same for the model for testing)
          timestep=timestep / 1000,
          guidance=guidance,
          pooled_projections=pooled_prompt_embeds,
          encoder_hidden_states=prompt_embeds,
          txt_ids=text_ids,
          img_ids=latent_image_ids,
          joint_attention_kwargs=self.joint_attention_kwargs,
          return_dict=False,
      )[0]

解决方案就是改用 (1) 的顺序，重新写 onnx_export 中的 model_args，注意，None 和 False 的值也要写上，可以将以下代码作为参考。

   onnx_export(
        pipeline.transformer,
        model_args=(
            torch.randn(1, 1024, 64).to(device=device, dtype=dtype), # torch.Size([1, 4096, 64]) latents = latents.reshape(batch_size, (height // 2) * (width // 2), num_channels_latents * 4)
            torch.randn(1, 512, 4096).to(device=device, dtype=dtype), # prompt_embeds torch.Size([1, 512, 4096])
            torch.randn(1, text_hidden_size).to(device=device, dtype=dtype), # pooled_prompt_embeds torch.Size([1, 768])
            torch.randn(1).to(device=device, dtype=dtype),
            torch.randn(1, 1024, 3).to(device=device, dtype=dtype),# img_ids=latent_image_ids, torch.Size([1, 1024, 3])
            torch.randn(1, 512, 3).to(device=device, dtype=dtype),# txt_ids=text_ids, torch.Size([1, 512, 3])
            None,
            None,
            False,
        ),
        output_path=transformer_path,
        ordered_input_names=["hidden_states", "encoder_hidden_states", "pooled_projections", "timestep",  "text_ids", "latent_image_ids", "guidance","joint_attention_kwargs","return_dict"],
        output_names=["out_sample"],  # has to be different from "sample" for correct tracing
        dynamic_axes={
            "transformer_sample": {1: "transformer_channels", 2: "transformer_size"},
        },
        opset=opset,