Onnx和TensorRT的Upsample操作

最新推荐文章于 2022-11-17 08:03:24 发布

nnzzll

最新推荐文章于 2022-11-17 08:03:24 发布

阅读量1.4k

点赞数

分类专栏：模型部署 TensorRT Onnx 文章标签： pytorch 深度学习神经网络

本文链接：https://blog.csdn.net/weixin_42694889/article/details/120264004

版权

模型部署同时被 3 个专栏收录

3 篇文章 0 订阅

订阅专栏

TensorRT

2 篇文章 0 订阅

订阅专栏

Onnx

1 篇文章 0 订阅

订阅专栏

项目场景：

部署UNet++的TensorRT推理引擎。

问题描述：

TensorRT(FP32)的推理结果和Pytorch的推理结果相差非常大，以1e-5的ATOL、1e-3的RTOL进行比较，误差率达到了50%以上，这是无法接受的。

Pytorch-1.8.0
TensorRT-7.2.3.4
cuda-11.1
cudnn-8.1.1

TensorRT模型的转换过程为
$\rm PyTorch\stackrel{torch.onnx.export}{\longrightarrow}Onnx\stackrel{tensorrt.OnnxParser}{\longrightarrow}TensorRT$ 。

原因分析：

UNet++的模型代码是按照bigmb/Unet-Segmentation-Pytorch-Nest-of-Unets的风格来写的。
在部署原生UNet时并没有出现这个问题，在比较两个模型的代码后，初步将问题定位在上采样操作上。
我在UNet中采用的上采样操作为nn.ConvTranspose2d(kernel_size=(2, 2), stride=(2, 2))，这个模块与nn.Conv2d()一样，都是有参数且可训练的卷积层。因此在Onnx和TensorRT中应该是与卷积层进行一样的优化操作。
而UNet++中采用的上采样操作为nn.Upsample(scale_factor=2,mode='bilinear',align_corners=True)，这个模块没有参数，不可训练，只是一个单纯的插值函数。

为了验证这个猜想，修改模型的前向传播，打印中间节点的输出。

Step1.只输出四层降采样的结果

    def forward(self, x):
        x0_0 = self.conv0_0(x)
        x1_0 = self.conv1_0(self.pool(x0_0))
        x0_1 = self.conv0_1(torch.cat([x0_0, self.up(x1_0)], 1))

        x2_0 = self.conv2_0(self.pool(x1_0))
        x1_1 = self.conv1_1(torch.cat([x1_0, self.up(x2_0)], 1))
        x0_2 = self.conv0_2(torch.cat([x0_0, x0_1, self.up(x1_1)], 1))

        x3_0 = self.conv3_0(self.pool(x2_0))
        x2_1 = self.conv2_1(torch.cat([x2_0, self.up(x3_0)], 1))
        x1_2 = self.conv1_2(torch.cat([x1_0, x1_1, self.up(x2_1)], 1))
        x0_3 = self.conv0_3(torch.cat([x0_0, x0_1, x0_2, self.up(x1_2)], 1))

        x4_0 = self.conv4_0(self.pool(x3_0))
        x3_1 = self.conv3_1(torch.cat([x3_0, self.up(x4_0)], 1))
        x2_2 = self.conv2_2(torch.cat([x2_0, x2_1, self.up(x3_1)], 1))
        x1_3 = self.conv1_3(torch.cat([x1_0, x1_1, x1_2, self.up(x2_2)], 1))
        x0_4 = self.conv0_4(
            torch.cat([x0_0, x0_1, x0_2, x0_3, self.up(x1_3)], 1))
        output = self.final(x0_4)
        return [x1_0,x2_0,x3_0,x4_0]

将TensorRT的结果与Pytorch的结果进行比较，发现结果一致。说明在Conv2d,ReLU,BatchNorm2d这些操作上，TensorRT与Pytorch是保持一致的。

Step2.输出每一层第一次上采样的结果

    def forward(self, x):
        x0_0 = self.conv0_0(x)
        x1_0 = self.conv1_0(self.pool(x0_0))
        up1 = self.up(x1_0)
        x0_1 = self.conv0_1(torch.cat([x0_0, up1], 1))

        x2_0 = self.conv2_0(self.pool(x1_0))
        up2 = self.up(x2_0)
        x1_1 = self.conv1_1(torch.cat([x1_0, up2], 1))
        x0_2 = self.conv0_2(torch.cat([x0_0, x0_1, self.up(x1_1)], 1))

        x3_0 = self.conv3_0(self.pool(x2_0))
        up3 = self.up(x3_0)
        x2_1 = self.conv2_1(torch.cat([x2_0, up3], 1))
        x1_2 = self.conv1_2(torch.cat([x1_0, x1_1, self.up(x2_1)], 1))
        x0_3 = self.conv0_3(torch.cat([x0_0, x0_1, x0_2, self.up(x1_2)], 1))

        x4_0 = self.conv4_0(self.pool(x3_0))
        up4 = self.up(x4_0)
        x3_1 = self.conv3_1(torch.cat([x3_0, up4], 1))
        x2_2 = self.conv2_2(torch.cat([x2_0, x2_1, self.up(x3_1)], 1))
        x1_3 = self.conv1_3(torch.cat([x1_0, x1_1, x1_2, self.up(x2_2)], 1))
        x0_4 = self.conv0_4(
            torch.cat([x0_0, x0_1, x0_2, x0_3, self.up(x1_3)], 1))

        output = self.final(x0_4)
        return [up1, up2, up3, up4]

这一次，TensorRT的结果与Pytorch的结果出现了很大的偏差，误差率在50%左右，说明问题确实出在了nn.Upsample这个操作上。

解决方案：

在网上查了一下之后，在这个ISSUE中看到了和我相同的问题。@ttyio的回复说的很清楚：

When we did not provide the scale input, TRT will take the
responsibility to calculate the scale, and we will check the
ResizeMode and AlignCorners, when alignCorners is true and resizeMode
is kLINEAR, the scale calculation behavior the same as
coordinate_transformation_mode set to align_corners in ONNX
(https://github.com/onnx/onnx/blob/master/docs/Operators.md#resize),
the formula is x_original = x_resized * (length_original - 1) /
(length_resized - 1), so the result is different than you directly set
scale to 2.

I have checked pytorch implementation, and you are right, when
align_corners = true, we should always use the output dims and input
dims to calculate the transformation_mode, else we will hit this
mismatch.

简单来说就是

TensorRT的Upsample操作中没有提供scale这个操作，它会自己去算应该如何scale
当align_corners设置为True，且插值方式设置为线性插值时，scale的计算结果会与Onnx中将coordinate_transformation_mode设置为align_corners的结果保持一致，而这个结果与将scale设置为2的结果是不同的
因此当align_corners设置为True，且插值方式设置为线性插值时，我们需要指定插值后的尺寸大小来计算转换方式，否则就会出现本文遇到的错误

因此将模型的四层上采样分别设置为

self.up1 = nn.Upsample(size=(512, 512), mode='bilinear', align_corners=True)
self.up2 = nn.Upsample(size=(256, 256), mode="bilinear", align_corners=True)
self.up3 = nn.Upsample(size=(128, 128), mode="bilinear", align_corners=True)
self.up4 = nn.Upsample(size=(64, 64), mode="bilinear", align_corners=True)