tensorRT加速遇到的若干问题

文章讲述了博主在将pth模型转换为ONNX格式并进一步优化为TensorRT时遇到的问题,涉及自定义LayerNormFunction的符号函数编写的挑战,以及如何通过Polygraphy工具定位并解决NAN和精度问题,包括针对FP16和INT8的转换优化策略。
摘要由CSDN通过智能技术生成

0x00 博主pth转化onnx时

import torch
from basicsr.models import create_model
from basicsr.train import parse_options
from basicsr.utils import FileClient, imfrombytes, img2tensor, padding, tensor2img, imwrite
import os


def pth_to_onnx(input, onnx_path, input_names=['input'], output_names=['output']):
    if not onnx_path.endswith('.onnx'):
        print('Warning! The onnx model name is not correct,\
              please give a name that ends with \'.onnx\'!')
        return 0

    opt = parse_options(is_train=False)
    opt['dist'] = False
    model = create_model(opt)

    model.net_g.cuda()
    model.net_g.eval()

    with torch.no_grad():
     torch.onnx.export(model.net_g, input, onnx_path, opset_version=11, verbose=False,
                       input_names=input_names, output_names=output_names, do_constant_folding=True)
     # torch.onnx.export(model.net_g, input, onnx_path,opset_version=13, verbose=True,
     #                   input_names=input_names, output_names=output_names,operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK)



if __name__ == '__main__':
    device = torch.device('cuda')

    onnx_path = './naf16net11_div.onnx'
    # input = torch.randn(1, 3, 480, 640, device="cuda")
    image_path = "left_194.jpg"

    file_client = FileClient('disk')

    img_bytes = file_client.get(image_path, None)
    try:
        img = imfrombytes(img_bytes, float32=True)
    except:
        raise Exception("path {} not working".format(image_path))

    img = img2tensor(img, bgr2rgb=True, float32=True)
    input = img.unsqueeze(dim=0).to(device)

    pth_to_onnx(input, onnx_path)

当时出现了未定义的算子问题,查看代码发现是自定义了函数:class LayerNormFunction(torch.autograd.Function):函数
在这里插入图片描述
根据LayerNormFunction函数的forward函数编写了symbolic函数
具体代码如下:

class LayerNormFunction(torch.autograd.Function):

    @staticmethod
    def symbolic(g, x, weight, bias, eps):
        # because forward function return 1 outputs, so this func also have to return 1 outputs
        # this is my expriment result, I didn't find any docs
        # Calculate y = (x - mu) / sqrt(var + eps)
        x_minus_mu = g.op("Sub", x, g.op("ReduceMean", x, axes_i=[1]))
        var = g.op("ReduceMean", g.op("Mul", x_minus_mu, x_minus_mu), axes_i=[1])
        sqrt_var_plus_eps = g.op("Add", var, g.op("Constant", value_t=torch.tensor(1e-6, dtype=torch.float)))
        rsqrt_var_plus_eps = g.op("Sqrt", sqrt_var_plus_eps)
        y = g.op("Div", x_minus_mu, rsqrt_var_plus_eps)

        weight = g.op("Unsqueeze", weight, axes_i=[0, 2, 3])  # Expand weight to match y's shape
        # weight = g.op("Unsqueeze", weight, g.op("Constant", value_t=torch.tensor([0, 2, 3], dtype=torch.long)))
        y = g.op("Mul", y, weight)
        bias = g.op("Unsqueeze", bias, axes_i=[0, 2, 3])  # Expand bias to match y's shape
        # bias = g.op("Unsqueeze", bias, g.op("Constant", value_t=torch.tensor([0, 2, 3], dtype=torch.long)))  # Expand bias to match y's shape
        y = g.op("Add", y, bias)

        return y


    @staticmethod
    def forward(ctx, x, weight, bias, eps=1e-6):
        ctx.eps = eps
        N, C, H, W = x.size()
        mu = x.mean(1, keepdim=True)
        var = (x - mu).pow(2).mean(1, keepdim=True)
        y = (x - mu) / (var + eps).sqrt()
        ctx.save_for_backward(y, var, weight)
        y = weight.view(1, C, 1, 1) * y + bias.view(1, C, 1, 1)
        return y

    @staticmethod
    def backward(ctx, grad_output):
        eps = ctx.eps

        N, C, H, W = grad_output.size()
        y, var, weight = ctx.saved_variables
        g = grad_output * weight.view(1, C, 1, 1)
        mean_g = g.mean(dim=1, keepdim=True)

        mean_gy = (g * y).mean(dim=1, keepdim=True)
        gx = 1. / torch.sqrt(var + eps) * (g - y * mean_gy - mean_g)
        return gx, (grad_output * y).sum(dim=3).sum(dim=2).sum(dim=0), grad_output.sum(dim=3).sum(dim=2).sum(
            dim=0), None

下面的symbolic函数可以通过chartgpt提交forward函数辅助编写symbolic。

@staticmethod
def symbolic(g, x, weight, bias, eps):
    # because forward function return 1 outputs, so this func also have to return 1 outputs
    # this is my expriment result, I didn't find any docs
    # Calculate y = (x - mu) / sqrt(var + eps)
    x_minus_mu = g.op("Sub", x, g.op("ReduceMean", x, axes_i=[1]))
    var = g.op("ReduceMean", g.op("Mul", x_minus_mu, x_minus_mu), axes_i=[1])
    sqrt_var_plus_eps = g.op("Add", var, g.op("Constant", value_t=torch.tensor(1e-6, dtype=torch.float)))
    rsqrt_var_plus_eps = g.op("Sqrt", sqrt_var_plus_eps)
    y = g.op("Div", x_minus_mu, rsqrt_var_plus_eps)

    weight = g.op("Unsqueeze", weight, axes_i=[0, 2, 3])  # Expand weight to match y's shape
    # weight = g.op("Unsqueeze", weight, g.op("Constant", value_t=torch.tensor([0, 2, 3], dtype=torch.long)))
    y = g.op("Mul", y, weight)
    bias = g.op("Unsqueeze", bias, axes_i=[0, 2, 3])  # Expand bias to match y's shape
    # bias = g.op("Unsqueeze", bias, g.op("Constant", value_t=torch.tensor([0, 2, 3], dtype=torch.long)))  # Expand bias to match y's shape
    y = g.op("Add", y, bias)

    return y

参考博客:
模型部署入门教程(四):在 PyTorch 中支持更多 ONNX 算子
pytorch模型转onnx格式,编写符号函数实现torch算子接口和onnx算子的映射,新建简单算子–模型部署记录整理

在这里插入图片描述

0x01 博主onnx转化为tensorrt时,使用了

trtexec --onnx=naf16net11_onnxsim.onnx --saveEngine=naf16net11_onnxsim.trt --fp16 --verbose
trtexec在NX板上有预装,windows上自己下载安装,目录D:\TensorRT-8.5.1.7\bin同样有 trtexec.exe,直接目录下运行指令就行。
NX板子中tensorrt如果想在conda环境下运行,因为NX板有预装,参考博客:Jetson NX实现TensorRT加速部署YOLOv8
在这里插入图片描述
从该处开始看。

遇到错误:

0. trtexec --onnx=naf16net11_onnxsim.onnx --saveEngine=naf16net11_onnxsim.trt --int8 --verbose

转为int8和fp16时,打印输出都为NAN,为了查找哪一层导致网络出现的NAN,使用了Polygraphy工具下载CLI或者API进行逐层的转化查找报错问题。
参考博客:
Polygraphy逐层对比onnx和tensorrt模型的输出
TensorRT部署YOLOv5出现精度问题的调查
使用指令:

polygraphy run naf16net11_onnxsim.onnx --onnxrt --onnx-outputs mark all --save-results=onnx_out.json
polygraphy run naf16net11_onnxsim.onnx --trt --trt-outputs mark all --save-results=trt_out.json --validate --fp16
polygraphy run naf16net11_onnxsim.onnx --trt --trt-outputs mark all --save-results=trt_out_8vv.json --validate --int8 -vv
--validate是为了检查每层的输出是否有Inf或者NAN

(0) 运行polygraphy run时尤其是–trt时转化为fp16和int8时报错Error loading “D:\ProgramData\Anaconda3\envs\yolov8\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll” or one of its dependencies.

conda环境下下载了cudatoolkit、cudnn、pytorch,更换了好几个cudatoolkit和pytorch版本,依旧存在报错问题,博主尝试了将目录D:\ProgramData\Anaconda3\envs\yolov8\Library\bin下的
在这里插入图片描述
都粘贴到D:\ProgramData\Anaconda3\envs\yolov8\lib\site-packages\torch\lib\目录下,然后就不报错了,应该是安装的cudnn包需要替换torch下的对应包,切记先把orch下的对应包备份再覆盖。内网未找到解决方案其他解决方案。

(1)回归正题,通过polygraphy run naf16net11_onnxsim.onnx --trt --trt-outputs mark all --save-results=trt_out.json --validate --fp16的逐层打印,我们发现了对应层的问题:

在这里插入图片描述
例如:
在这里插入图片描述
例如层:s.15存在问题,使用netron工具打开对应onnx,再结合源代码,考虑溢出问题,根据原代码进行修改,缩放和限制区间的方法:例如问题出现在s = x.cumsum(dim=-1).cumsum_(dim=-2)

self.fp16_div = 1e6
self.fp16_mul = 5e5
..........
n, c, h, w = x.shape
x = x/self.fp16_div  # 缩放
x = torch.clamp(x, min=-100, max=100)  # 限制值的空间
s = x.cumsum(dim=-1).cumsum_(dim=-2)
s = s*self.fp16_mul

参考博客:TensorRT debug及FP16浮点数溢出问题分析

  • 20
    点赞
  • 25
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值