Unsupported: ONNX export of convolution for kernel of unknown shape

Luchang-Li

已于 2024-07-24 21:56:48 修改

阅读量1.4k

点赞数 6

分类专栏：推理引擎文章标签：深度学习人工智能 ONNX Unsupported

于 2024-07-19 17:02:16 首次发布

本文链接：https://blog.csdn.net/u013701860/article/details/140554350

版权

推理引擎专栏收录该内容

32 篇文章

订阅专栏

参考案例1

[onnx]Unsupported: ONNX export of convolution for kernel of unknown shape · Issue #98497 · pytorch/pytorch · GitHub

import torch

class Filter(nn.Module):
    def __init__(self):
        super().__init__()
        self.resample_filter = torch.rand(4,4)

    def forward(self, x):
        x = torch.nn.functional.pad(x, [1, 1, 1, 1])  # If this line is commented out, it works.
        weight = self.resample_filter[None, None].repeat([x.shape[1]  , 1] + [1] * self.resample_filter.ndim)
        x = torch.nn.functional.conv2d(input=x, padding=1, weight=weight, groups=x.shape[1] )
        return x


x = torch.rand((1, 3, 256, 256))
f = Filter()
y = f(x)
torch.onnx.export(f, x, "test-filter.onnx", opset_version=15)

错误信息

File "/root/miniconda3/lib/python3.9/site-packages/torch/onnx/symbolic_opset9.py", line 2519, in _convolution
raise errors.SymbolicValueError(
torch.onnx.errors.SymbolicValueError: Unsupported: ONNX export of convolution for kernel of unknown shape. [Caused by the value '28 defined in (%28 : Float(*, *, *, *, strides=[199692, 66564, 258, 1], requires_grad=0, device=cpu) = onnx::Pad[mode="constant"](%0, %27, %3), scope: __main__.Filter:: # /mnt/f/codes/onnx_export/test.py:10:0
)' (type 'Tensor') in the TorchScript graph. The containing node has kind 'onnx::Pad'.]

这种问题一般出现在卷积的权重不是常规的直接的训练参数，而是从其他计算分支计算得到。

调试：

进入上面torch/onnx/symbolic_opset9.py", line 2519加入打印：

错误提示为Caused by the value '28 defined in (%28 : Float(*, *, *, *, strides=[199692, 66564, 258, 1], requires_grad=0, device=cpu)

从%28一直往上跟踪找到第一个出现*未知shape的位置：

%28 : Float(*, *, *, *, strides=[199692, 66564, 258, 1], requires_grad=0, device=cpu) = onnx::Pad[mode="constant"](%0, %27, %3), scope: __main__.Filter:: # /mnt/f/codes/onnx_export/test.py:10:0
这里指示了是test.py第10行引起的，也就是pad那一句导致的。

这其实是底层infer shape的bug。

一种解决方案是去底层修改pytorch支持infer shape。

另一种是采取一些方法规避，使得进入conv前的shape是已知的，我们加入一个reshape 算子：

class Filter(nn.Module):
    def __init__(self):
        super().__init__()
        self.resample_filter = torch.rand(4, 4)

    def forward(self, x):
        x = torch.nn.functional.pad(x, [1, 1, 1, 1])  # If this line is commented out, it works.
        shape = x.shape
        shape = [int(elem) for elem in shape]
        x = x.reshape(shape)

        weight = self.resample_filter[None, None].repeat([x.shape[1], 1] + [1] * self.resample_filter.ndim)
        x = torch.nn.functional.conv2d(input=x, padding=1, weight=weight, groups=x.shape[1])
        return x

注意改动为3行：

shape = x.shape
shape = [int(elem) for elem in shape]
x = x.reshape(shape)
这使得x的shape重新被完全静态确定。

改动后该代码可以进行导出。

如果想导出动态图像大小，可以考虑指对batch channel的维度对应的shape进行int固化，看看是否成功。

有时候工程代码太复杂，不知道pad在哪里，那就直接去torch 的pad代码forward里面插入上面的语句。例如：

torch.onnx.errors.SymbolicValueError: Unsupported: ONNX export of convolution for kernel of unknown shape. [Caused by the value '2004 defined in (%2004 : Float(*, *, *, *, strides=[1102464, 5742, 87, 1], requires_grad=0, device=cuda:0) = onnx::Pad[mode="constant"](%xi, %2002, %2003), # /root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/padding.py:205:0

那就直接去torch/nn/modules/padding.py:205:0插入上面代码，注意应该插入到对pad计算结果处理，而不是对pad的输入处理。另外还可以搜索一下算法代码的pad算子，对pad结果进行处理。

类似的解决方案还可以考虑替换为底层没有infer shape bug的算子。例如把上面的pad改为concat算子。

另外最好把pytorch版本升级为最新的，可能修复了一些infer shape的bug。