PyTorch转ONNX之F.interpolate

最新推荐文章于 2023-04-19 22:15:54 发布

麦克斯韦恶魔

最新推荐文章于 2023-04-19 22:15:54 发布

阅读量1.3w

点赞数 18

分类专栏：学习笔记 # PYT 文章标签： onnx pytorch interpolate opset_version upsample

本文链接：https://blog.csdn.net/github_28260175/article/details/105704337

版权

学习笔记同时被 2 个专栏收录

43 篇文章 6 订阅

订阅专栏

PYT

6 篇文章 0 订阅

订阅专栏

PyTorch转ONNX之F.interpolate

文章目录

PyTorch转ONNX之F.interpolate

一、环境说明

Conda 4.7.11
Python 3.6.9
PyTorch 1.4.0
ONNX 1.6.0
protobuf 3.9.2

二、ONNX安装问题

如果使用conda install onnx或者conda install -c conda-forge onnx，大有可能会在import onnx时，出现ImportError: libprotobuf.so.20: cannot open shared object file: No such file or directory，这是因为在现在的python版本下，conda默认安装的protobuf版本较低。可参考这条issue#2434，按下列命令进行安装：

conda install protobuf=3.9
conda install -c conda-forge onnx

三、F.interpolate

PyTorch转ONNX目前遇到的最难受的地方，是对F.interpolate的差强人意的支持，在模型里面一旦有使用F.interpolate的上采样方法，就会出问题。问题如下：

1. ONNX的op版本`opset_version`

在转换过程中，我们一般会使用命令torch.onnx.export(model, input, "onnx_name.onnx")。那么默认采用的opset_version=9，当切换为opset_version=10、opset_version=11后，用Netro可视化下进行对比，对比如下。

在这里插入图片描述

可以看出，对于同一个节点(node)，当F.interpolate(mode='bilinear', align_corners=False)时，op9会将F.interpolate替换为onnx.Upsample，op10会将其替换为onnx.Resize，而op11会提供一个onnx.Constant，里面是一个tensor，而且onnx.Resize内部会出现其它属性。

那么这三者有什么具体区别呢。我可能提供不了准确的区别，下面是我的看法。

对于op9与op10，应该是比较近似的，除了方法从onnx.Upsample变成了onnx.Resize，因此需要看看ONNX的源码，两者有什么区别，另外，注意INPUT的scales，它们都使用了同样的scales，这个scales，是一个onnx.Constant的node，在可视化中是看不到的，它的格式是float32，这就是op9、op10与op11的重要区别，也是后续坑的来源。接着，对于op11而言，它使用了onnx.Constant作为一个node，而且在点开看onnx.Resize后，可以看见出现了coordinate_transformation_mode与cubic_coeff_a、nearest_mode属性，这是op11完全支持F.interpolate所产生的属性，coordinate_transformation_mode是对应align_corners，cubic_coeff_a对应mode=bicubic，nearest_mode是对应mode=nearest，而查看INPUTS栏，它的sizes内容的格式是int64，op9/op10的float32与op11的int64的不同，造成了坑点，接下来是说明这里的问题。

2. 插值方法与op版本

首先，在op9/op10下，F.interpolate(mode=nearest)是没问题的，也不会出现什么警告，当对于F.interpolate(mode=bilinear, align_corners=False)时，能转换成功，但会出现如下警告，为什么会出现这个警告，我感觉是与下面的计算输出大小的问题有关，不知道大家对此有什么看法，麻烦大家赐教。

You are trying to export the model with onnx:Upsample for ONNX opset version 9.

UserWarning: You are trying to export the model with onnx:Resize for ONNX opset version 10.

This operator might cause results to not match the expected results by PyTorch.
ONNX's Upsample/Resize operator did not match Pytorch's Interpolation until opset 11. Attributes to determine how to transform the input were added in onnx:Resize in opset 11 to support Pytorch's behavior (like coordinate_transformation_mode and nearest_mode).
We recommend using opset 11 and above for models using this operator.

接着，当F.interpolate(mode=bilinear, align_corners=True)时，转换就会失败，报错如下：

UserWarning: ONNX export failed on upsample_bilinear2d because align_corners == True not supported

而在op10下，上述警告、报错就不会出现，那么总结一下，对于ONNX1.6而言，目前支持如下操作：

F.interpolate	nearest	bilinear, align_corners=False	bilinear, align_corners=True	bicubic
op-9	Y	Y	N	N
op-10	Y	Y	N	N
op-11	Y	Y	Y	Y

那么我们需要知道，ONNX是怎么确认上采样后输出的大小的呢，之前提到，op9/op10的scales的格式为float32，op11的sizes的格式为int64，为什么一直在提scales与sizes呢，因为它们与计算输出大小有紧密联系。

例如，大小为input_size=[1, 3, 5, 5]的tensor作为输入，我们希望将tensor插值到output_size=[1, 3, 9, 9]。对于op9/op10而言，INPUTS中的X为输入tensor，scales为input_size * scales = output_size，这就是scales的作用，因为scales的格式为float32，因此这个output_size竟然就是float32的，而scales=[1., 1., 1.799, 1.799]，所以得到的这个output_size=[1, 3, 8.999, 8.999]，所以预计的output_size与ONNX计算出来的output_size在精度上就会出问题，导致前后不相等，这样的结果很神奇吧。对于op11而言，它提供了额外的onnx.Constant的node，INPUTS的sizes直接就是output_size，而sizes的格式为int64，所以op11的output_size与ONNX计算出来的output_size一致。我猜这就是ONNX's Upsample/Resize operator did not match Pytorch's Interpolation until opset 11.警告的来源之一。