模型部署系列
- 【保姆级教程附代码】Pytorch (.pth) 到 TensorRT (.plan) 模型转化全流程
- 【保姆级教程附代码(二)】Pytorch (.pth) 到 TensorRT (.plan) 模型转化全流程细化
- 【官方文档解读】torch.jit.script 的使用,并附上官方文档中的示例代码
- 【CLIP模型从.pt到.onnx】ValueError: Unsupported type for attn_mask: 5 已解决
问题描述
onnx_export 过程中,一直出现有 forward() 中的输入变量数量不对的问题
TypeError: CombinedTimestepTextProjEmbeddings.forward() takes 3 positional arguments but 4 were given
onnx_export(
pipeline.transformer,
model_args=(
torch.randn(1, 1024, 64).to(device=device, dtype=dtype), # torch.Size([1, 4096, 64]) latents = latents.reshape(batch_size, (height // 2) * (width // 2), num_channels_latents * 4)
torch.randn(1).to(device=device, dtype=dtype),
# None,
torch.randn(1, text_hidden_size).to(device=device, dtype=dtype), # pooled_prompt_embeds torch.Size([1, 768])
torch.randn(1, 512, 4096).to(device=device, dtype=dtype), # prompt_embeds torch.Size([1, 512, 4096])
torch.randn(1, 512, 3).to(device=device, dtype=dtype),# txt_ids=text_ids, torch.Size([1, 512, 3])
torch.randn(1, 1024, 3).to(device=device, dtype=dtype),# img_ids=latent_image_ids, torch.Size([1, 1024, 3])
# None,
False,
),
output_path=transformer_path,
ordered_input_names=["sample", "timestep", "guidance", "pooled_prompt_embeds", "prompt_embeds", "text_ids", "latent_image_ids", "joint_attention_kwargs","return_dict"],
问题定位
通过往前回溯排查,发现 onnx_export 时输入的 model_args
和实际运行 pipeline.transformer 的输入没有对齐。
原因以及解决方案
通过在 onnx_export 时将断点打到调用的 pipeline.transformer 模型 forward 开头位置,发现是与此处的输入变量一一对应,而非调用 pipeline.transformer 处的输入顺序。
(1)✅pipeline.transformer 原模型的输入顺序如下:
以 diffusers 的 flux 为例,/path/diffusers/models/transformers/transformer_flux.py
def forward(
self,
hidden_states: torch.Tensor,
encoder_hidden_states: torch.Tensor = None,
pooled_projections: torch.Tensor = None,
timestep: torch.LongTensor = None,
img_ids: torch.Tensor = None,
txt_ids: torch.Tensor = None,
guidance: torch.Tensor = None,
joint_attention_kwargs: Optional[Dict[str, Any]] = None,
return_dict: bool = True,
) -> Union[torch.FloatTensor, Transformer2DModelOutput]:
(2)❌之前搞错的顺序,使用了调用该模型时的输入顺序:
/path/diffusers/pipelines/flux/pipeline_flux.py
noise_pred = self.transformer(
hidden_states=latents,
# YiYi notes: divide it by 1000 for now because we scale it by 1000 in the transforme rmodel (we should not keep it but I want to keep the inputs same for the model for testing)
timestep=timestep / 1000,
guidance=guidance,
pooled_projections=pooled_prompt_embeds,
encoder_hidden_states=prompt_embeds,
txt_ids=text_ids,
img_ids=latent_image_ids,
joint_attention_kwargs=self.joint_attention_kwargs,
return_dict=False,
)[0]
解决方案就是改用 (1) 的顺序,重新写 onnx_export 中的 model_args,注意,None 和 False 的值也要写上,可以将以下代码作为参考。
onnx_export(
pipeline.transformer,
model_args=(
torch.randn(1, 1024, 64).to(device=device, dtype=dtype), # torch.Size([1, 4096, 64]) latents = latents.reshape(batch_size, (height // 2) * (width // 2), num_channels_latents * 4)
torch.randn(1, 512, 4096).to(device=device, dtype=dtype), # prompt_embeds torch.Size([1, 512, 4096])
torch.randn(1, text_hidden_size).to(device=device, dtype=dtype), # pooled_prompt_embeds torch.Size([1, 768])
torch.randn(1).to(device=device, dtype=dtype),
torch.randn(1, 1024, 3).to(device=device, dtype=dtype),# img_ids=latent_image_ids, torch.Size([1, 1024, 3])
torch.randn(1, 512, 3).to(device=device, dtype=dtype),# txt_ids=text_ids, torch.Size([1, 512, 3])
None,
None,
False,
),
output_path=transformer_path,
ordered_input_names=["hidden_states", "encoder_hidden_states", "pooled_projections", "timestep", "text_ids", "latent_image_ids", "guidance","joint_attention_kwargs","return_dict"],
output_names=["out_sample"], # has to be different from "sample" for correct tracing
dynamic_axes={
"transformer_sample": {1: "transformer_channels", 2: "transformer_size"},
},
opset=opset,