MobileViT v2导出onnx模型时遇Col2Im算子无法导出问题

xunan003

已于 2023-10-12 14:45:44 修改

阅读量1.2k

点赞数 7

文章标签：深度学习 pytorch 人工智能 transformer onnx MobileViT-v2 mobilevitv2-2.0

于 2023-10-10 20:24:40 首次发布

本文链接：https://blog.csdn.net/xunan003/article/details/133752722

版权

num_dimensional_axis = symbolic_helper._get_tensor_sizes(output_size)[0] TypeError: 'NoneType' object is not subscriptable
(Occurred when translating col2im)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. In Node, ("/classifier/classifier.0/ReduceMean", ReduceMean, "", -1) : ("/layer_5/layer_5.1/conv_proj/block/conv/Conv_output_0": tensor(float),) -> ("/classifier/classifier.0/ReduceMean_output_0",) , Error Unrecognized attribute: axes for operator ReduceMean
torch.onnx.errors.CheckerError: Unrecognized attribute: axes for operator ReduceMean
Context: Bad node spec for node. Name: /classifier/classifier.0/ReduceMean OpType: ReduceMean
raise ValueError(f"Unsupported ONNX opset version: {value}")
ValueError: Unsupported ONNX opset version: 18
torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::col2im' to ONNX opset version 13 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues

1. 初始导出

原始模型直接导出（opset version配置版本为13），插入脚本如下：

    input = torch.randn((1, 3, 224, 224))
    torch.onnx.export(model, input, "weights/mobilevitv2-2.0_opset13.onnx", input_names=["input"], output_names=["output"], verbose=True, opset_version=13)

遇到error如下：

Log提示由于网络结构中存在col2im算子，而ONNX opset version 13不支持该算子，直到opset version 18才支持。故将torch.onnx.export()中opset_version设为18。

2. 配置opset 18选项

    input = torch.randn((1, 3, 224, 224))
    torch.onnx.export(model, input, "weights/mobilevitv2-2.0_opset18.onnx", input_names=["input"], output_names=["output"], verbose=True, opset_version=18)

执行后出现错误如下：

从log可以看出，该错误的原因为当前torch版本不支持导出opset version 18，故需升级torch。

3. 升级torch

我们将所有torch相关依赖包均升级为最高版本

再次执行推理导出，结果仍然报错，错误log如下：

4. 固定输出维度

经google搜索issue后确定，torch.onnx.export不支持动态输入目标shape的col2im算子导出，故我们找到对应Col2Im算子对应前向推理部分，如下：

其中F.fold()函数即为torch端的Col2Im算子，前面不支持导出的变长输入实为output_size，经debug得出output_size共存在三种输入分别为（28，28）、（14，14）以及（8，8），故我们只需要利用if else将其改为固定输入即可，修改如下：

修改脚本后再次执行即可正常导出opset version 18的onnx模型，对应的Col2Im算子如下图

虽然在导出时log中显示ReduceMean的错误（如下图），但onnx模型已经被保存出。

重点信息与ReduceMean的属性axes有关，详细解决请看第5点。

5. 推理错误

导出模型后，我们调用onnxruntime进行推理，出现如下报错：

简单看，就是网络中的ReduceMean算子有个属性axes是未识别的。查阅onnx官方资料可知，在opset=13时，用于决定ReduceMean计算轴的axes被安排在Attribute中，而opset=18则需放在节点node的input中。如下图：

所以，怀疑就是torch.onnx.export在导出onnx时，可能存在内置的bug，导致ReduceMean在转出时还是沿用了旧方案，将axes仍然放置在ATTRIBUTES（属性配置）中，经过检查刚导出的onnx模型，果然发现此问题，如下图：

所以，我们通过python脚本读入原始onnx模型，手动调整axes位置，调用onnx.helper.make_tensor新建axes tensor添置在ReduceMean算子的INPUTS中，即可正常运行。此处提供一份手动修改脚本，供参考，如下：

import onnx
from onnx import TensorProto

onnx_model = onnx.load_model("./weights/mobilevitv2-2.0_opset18.onnx")
for node in onnx_model.graph.node:
    if node.op_type == 'ReduceMean':
        for idx, cur_attr in enumerate(node.attribute):
            if cur_attr.name == 'axes':
                del node.attribute[idx]                                     
                ### 删除原始attribute中的axes
                break
        input_tensor = onnx.helper.make_tensor(name=node.name+'_axes',
                                               data_type=TensorProto.INT64,
                                               dims=[2],
                                               vals=[-2, -1])               
        ### 创建一个存放axes的tensor
        onnx_model.graph.initializer.append(input_tensor)                   
        ### 将新创建的axes tensor添加到onnx模型initializer proto结构中
        node.input.append(input_tensor.name)                                
        ### 将axes tensor的name加入到ReduceMean算子的input中
        break
onnx.save_model(onnx_model, "./weights/mobilevitv2-2.0_opset18_repair_reducemean.onnx")

另外，调用onnxruntime的通用推理脚本，请参考我的github工程（觉得好用，请帮忙点star），工程地址：https://github.com/xncaffe/caffe_convert_onnx ，按教程安装依赖包后即可使用onnx_inference.py进行推理。

6. 后续方案

虽然通过前面的方法正常导出了opset version为18的onnx模型，但是由于市面上很多移动平台并不支持opset version为18的版本，而降级opset会带来很大的工作量且不会得到onnx官方的支持。那怎么样才能不改变模型结构，正常导出任何版本opset的onnx模型呢？

为此，我们分析了此模型中Col2Im的作用，其实就是做了维度转换，刚好是此网络前端的Gather部分的逆运算，每个Col2Im都与一组Gather+Transpose对应。如上图。所以，该过程其实可以完全使用Reshape和Transpose算子组合代替。故我们修改前向，使用reshape+permute组合替代前述的F.fold流程，修改后脚本如下：

修改后即可随意导处任何opset版本的onnx模型，而对应Col2Im在onnx模型被显示如下：

导出的opset=13的onnx和前述的使用Col2Im opset=18的onnx模型，调用onnx_inference.py推理后发现，结果完全一致，证明我们的推理完全正确，如下图：

xunan003

关注

7
点赞
踩
7

收藏

觉得还不错? 一键收藏
1
评论
MobileViT v2导出onnx模型时遇Col2Im算子无法导出问题

num_dimensional_axis = symbolic_helper._get_tensor_sizes(output_size)[0] TypeError: 'NoneType' object is not subscriptable (Occurred when translating col2im)
复制链接

扫一扫