TensorRT部署神经网络

该文介绍了如何使用TensorRT加速PyTorch模型的推理过程,包括将模型转换为TRT引擎,进行FP16模式的优化,以及使用torch2trt库进行模型转换。文章还提到了量化训练,如QuantizationAwareTraining,以及导出ONNX模型以配合TensorRT进一步优化。此外,讨论了算子融合和性能分析对于提升计算效率的重要性。
摘要由CSDN通过智能技术生成

TensorRT部署神经网络

大佬的讲解记录一下
image-20230115213800336

基础知识

image-20230115213711582

image-20230115213831854

image-20230115213902303

image-20230115214033230

image-20230115214119757

image-20230115214126672

image-20230115214529633

image-20230115214545649

TensorRT使用例子

TensorRT加速模型

示例代码

# ---------------------------------------------------------------
# 这个脚本向你展示了如何使用 torch2trt 加速 pytorch 推理
# 截止目前为止 torch2trt 的适配能力有限,不要尝试运行特别奇怪的模型
# 你可以把模型分块来绕开那些不支持的算子。

# 使用之前你必须先装好 tensorRT
# https://github.com/NVIDIA-AI-IOT/torch2trt


# ---------------------------------------------------------------

import torch
import torch.profiler
import torch.utils.data
import torchvision
from torch2trt import torch2trt
from tqdm import tqdm

# load model
SAMPLES = [torch.zeros(1, 3, 224, 224) for _ in range(1024)]
MODEL = torchvision.models.resnet18()
FP16_MODE = True

# Model has to be eval mode, and deploy to cuda.
MODEL.eval()
MODEL.cuda()

# benckmark with pytorch
for sample in tqdm(SAMPLES, desc='Torch Executing'):
    MODEL.forward(sample.cuda())

# convert torch.nn.module with tensorrt
# 在转换过后,你模型中的执行函数将会被 trt 替换,同时进行图融合
model_trt = torch2trt(MODEL, [sample.cuda()], fp16_mode=FP16_MODE)
for sample in tqdm(SAMPLES, desc='TRT Executing'):
    model_trt.forward(sample.cuda())
print(isinstance(model_trt, torch.nn.Module))

# benchmark with your model.
with torch.profiler.profile(
    activities=[
        torch.profiler.ProfilerActivity.CPU,
        torch.profiler.ProfilerActivity.CUDA],
    schedule=torch.profiler.schedule(
        wait=2,
        warmup=1,
        active=7),
    # on_trace_ready=trace_handler
    on_trace_ready=torch.profiler.tensorboard_trace_handler('log')
    # used when outputting for tensorboard
    ) as p:
        for iter in range(10):
            model_trt.forward(sample.cuda())
            # send a signal to the profiler that the next iteration has started
            p.step()

with torch.profiler.profile(
    activities=[
        torch.profiler.ProfilerActivity.CPU,
        torch.profiler.ProfilerActivity.CUDA],
    schedule=torch.profiler.schedule(
        wait=2,
        warmup=1,
        active=7),
    # on_trace_ready=trace_handler
    on_trace_ready=torch.profiler.tensorboard_trace_handler('log')
    # used when outputting for tensorboard
    ) as p:
        for iter in range(10):
            MODEL.forward(sample.cuda())
            # send a signal to the profiler that the next iteration has started
            p.step()

优化前

image-20230115215716672

优化后

image-20230115215644567

融图,多余的kernal去除 速度更快

TensorRT量化训练

代码

import torch
import torch.utils.data
import torchvision
from absl import logging

# 装一下下面这个库
from pytorch_quantization import nn as quant_nn

logging.set_verbosity(logging.FATAL)  # Disable logging as they are too noisy in notebook

from pytorch_quantization import quant_modules

# 调用这个 quant_modules.initialize()
# 然后你正常训练就行了 ...
quant_modules.initialize()

model = torchvision.models.resnet50()
model.cuda()

# Quantization Aware Training is based on Straight Through Estimator (STE) derivative approximation. 
# It is some time known as “quantization aware training”. 
# We don’t use the name because it doesn’t reflect the underneath assumption. 
# If anything, it makes training being “unaware” of quantization because of the STE approximation.

# After calibration is done, Quantization Aware Training is simply select a training schedule and continue training the calibrated model. 
# Usually, it doesn’t need to fine tune very long. We usually use around 10% of the original training schedule, 
# starting at 1% of the initial training learning rate, 
# and a cosine annealing learning rate schedule that follows the decreasing half of a cosine period, 
# down to 1% of the initial fine tuning learning rate (0.01% of the initial training learning rate).

# Quantization Aware Training (Essentially a discrete numerical optimization problem) is not a solved problem mathematically.
# Based on our experience, here are some recommendations:

# For STE approximation to work well, it is better to use small learning rate. 
# Large learning rate is more likely to enlarge the variance introduced by STE approximation and destroy the trained network.

# Do not change quantization representation (scale) during training, at least not too frequently. 
# Changing scale every step, it is effectively like changing data format (e8m7, e5m10, e3m4, et.al) every step, 
# which will easily affect convergence.

# https://github.com/NVIDIA/TensorRT/blob/main/tools/pytorch-quantization/examples/finetune_quant_resnet50.ipynb

def export_onnx(model, onnx_filename, batch_onnx):
    model.eval()
    quant_nn.TensorQuantizer.use_fb_fake_quant = True # We have to shift to pytorch's fake quant ops before exporting the model to ONNX
    opset_version = 13

    # Export ONNX for multiple batch sizes
    print("Creating ONNX file: " + onnx_filename)
    dummy_input = torch.randn(batch_onnx, 3, 224, 224, device='cuda') #TODO: switch input dims by model
    torch.onnx.export(model, dummy_input, onnx_filename, verbose=False, opset_version=opset_version, enable_onnx_checker=False, do_constant_folding=True)
    return True

TensorRT 后训练量化(PPQ)

Quant with TensorRT OnnxParser

Quant with TensorRT API

提升算子计算效率

image-20230115221658784

image-20230116195930702

image-20230116200024535

image-20230116200118117

image-20230116200130211

可以融合的结构

image-20230116200256288

image-20230116200339137

image-20230116200457163

image-20230116200556535

Tensor对齐

image-20230116200652667

Profiling

image-20230116200836557

自定义算子

image-20230116200914156

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
TensorRT是NVIDIA推出的用于高性能推断的深度学习优化器和运行时引擎。它可以针对深度神经网络模型进行优化,从而加速模型的推理过程。在TensorRT中,部署ResNet网络可以按照以下步骤进行: 1. 生成ONNX文件:首先,将ResNet模型转换为ONNX格式。使用PyTorch官方提供的torch.onnx.export()函数可以将模型转换为ONNX文件。在转换过程中,需要指定输入的维度和模型的权重等信息。 2. 创建推理引擎:接下来,使用TensorRT的API来创建推理引擎。可以使用TensorRT的Builder和Network类来构建网络结构,并设置优化参数和推理选项。 3. 编译和优化:在创建网络结构后,需要使用TensorRT的Builder类将网络编译为可执行的推理引擎。在此过程中,TensorRT会对网络进行优化,包括融合卷积、批量归一化和激活函数等操作,以提高推理性能。 4. 推理:最后,使用生成的推理引擎对输入数据进行推理。可以将输入数据传递给推理引擎,并获取输出结果。 需要注意的是,部署TensorRT需要在系统中安装相应的软件和依赖库,如CUDA和TensorRT。在Windows 10系统上完成部署时,需要确保软件和依赖包的正确安装和配置。 总结来说,TensorRT部署ResNet网络的过程包括ONNX文件的生成、推理引擎的创建、编译和优化以及最后的推理过程。这样可以提高模型的推理性能,并加速图片分类项目的部署。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* [tensorRT部署resnet网络Python、c++源码](https://download.csdn.net/download/matlab_xiaogen/86404017)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"] - *2* *3* [TensorRT部署总结(一)](https://blog.csdn.net/qq_23022733/article/details/124566752)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值