TensorRT优化部署（一）--TensorRT和ONNX基础

小豆包的小朋友0217

已于 2024-01-16 21:08:20 修改

阅读量2.5k

点赞数 45

分类专栏： TensorRT模型优化部署文章标签： python linux

于 2024-01-04 16:27:22 首次发布

本文链接：https://blog.csdn.net/m0_70420861/article/details/135384649

版权

TensorRT模型优化部署专栏收录该内容

10 篇文章 13 订阅

订阅专栏

系列文章目录

第一章 TensorRT优化部署（一）–TensorRT和ONNX基础
 第二章 TensorRT优化部署（二）–剖析ONNX架构
 第三章 TensorRT优化部署（三）–ONNX注册算子
 第四章 TensorRT模型优化部署（四）–Roofline model
第五章 TensorRT模型优化部署（五）–模型优化部署重点注意
 第六章 TensorRT模型优化部署（六）–Quantization量化基础（一）
第七章 TensorRT模型优化部署（七）–Quantization量化（PTQ and QAT)（二）
第八章 TensorRT模型优化部署 (八）–模型剪枝Pruning

前言

自学视频笔记，专题内容后续有补充。

一、模型部署目的

模型训练 (Training)

$\left\{ \begin{array}{l} 精度越高越好 \\ \\ 把模型做的深一点，宽一点 \\ \\ 使用丰富的\quad Data\quad augmentation\\ \\ 使用各种\quad Training\quad trick\end{array}\right.$

模型部署 (Deploy)

$\left\{ \begin{array}{l} 以精度不变，或者精度掉点很小的情况下尽量压缩模型 \\ \\ 减少计算量 \\ \\ 减少\quad memory\quad access\\ \\ 提高计算密度\end{array}\right.$

自动驾驶中模型部署所关注的东西

RealTime(实时性)
Power Consumption(消耗电力)（我们需要的算力越大，消耗的电力越大）
Long range Accuracy(远距离)

Attention:
TensorRT虽然很方便，但是不能过分依赖。需要结合部署的硬件特性，做一些Benchmark和Profiling,学习使用一些Profile tools去分析模型的计算瓶颈，分析预测优化策略和实际优化策略产生分歧的原因。

Benchmark通常指对某个软件系统或者部分系统进行性能测试，目的是评估其性能表现。
Profiling则是指对代码的性能进行分析和优化。

二、TensorRT的模块

TensorRT内部各类优化策略如下图：
在这里插入图片描述

2.1 Layer fusion(层融合）

层融合可以减少启动kernel的开销与memort操作，从而提高效率同时，有些计算可以通过层融合优化后，跟其他计算合并。

1. Vertical layer fusion(垂直层融合)

conv+BN+ReLu进行层融合

Step1: 在这里插入图片描述 Step2:
注意：conv + BN + ReLU的计算时间和conv的计算时间差不多

2. Horizontal layer fusion(水平层融合)

Step3: 在这里插入图片描述

2.2 Kernel auto-tuning

TensorRT内部对于同一个层使用各种不同kernel函数进行性能测试

比如对于FC层中的矩阵乘法，根据tile size有很多中kernel function 。(e.g. 32x32, 32x64, 64x64, 64x128, 128x128，针对不同硬件有不同策略)

2.3 Quantization

在这里插入图片描述
量化，压缩模型的很重要的策略，将单精度类型(FP32)训练权重转变为半精度(FP16)或者整型(INT8, INT4)

三、ONNX是什么？

ONNX是一种神经网络的格式，采用Protobuf(*)二进制形式进行序列化模型。Protobuf会根据用于定义的数据结构来进行序列化存储。
下面是一些简单的例子，代码链接：

3.1 example.py

import torch
import torch.nn as nn
import torch.onnx

class Model(torch.nn.Module):
    def __init__(self, in_features, out_features, weights, bias=False):
        super().__init__()
        self.linear = nn.Linear(in_features, out_features, bias)
        with torch.no_grad():
            self.linear.weight.copy_(weights)
    
    def forward(self, x):
        x = self.linear(x)
        return x

def infer():
    in_features = torch.tensor([1, 2, 3, 4], dtype=torch.float32)
    weights = torch.tensor([
        [1, 2, 3, 4],
        [2, 3, 4, 5],
        [3, 4, 5, 6]
    ],dtype=torch.float32)
    
    model = Model(4, 3, weights)
    x = model(in_features)
    print("result is: ", x)

def export_onnx():
    input   = torch.zeros(1, 1, 1, 4)
    weights = torch.tensor([
        [1, 2, 3, 4],
        [2, 3, 4, 5],
        [3, 4, 5, 6]
    ],dtype=torch.float32)
    model   = Model(4, 3, weights)
    model.eval() #添加eval防止权重继续更新

    # pytorch导出onnx的方式，参数有很多，也可以支持动态size
    # 重点部分===========================================
    torch.onnx.export(
        model         = model, 
        args          = (input,),
        f             = "../models/example.onnx",
        input_names   = ["input0"],
        output_names  = ["output0"],
        opset_version = 12)  #onnx指令集版本
    print("Finished onnx export")


if __name__ == "__main__":
    infer()
    export_onnx()

输出结果
在这里插入图片描述

使用Netron打开
在这里插入图片描述

3.2 example_two_head.py 两个输出

import torch
import torch.nn as nn
import torch.onnx
#模型定义=================
class Model(torch.nn.Module):
    def __init__(self, in_features, out_features, weights1, weights2, bias=False):
        super().__init__()
        self.linear1 = nn.Linear(in_features, out_features, bias)
        self.linear2 = nn.Linear(in_features, out_features, bias)
        with torch.no_grad():
            self.linear1.weight.copy_(weights1)
            self.linear2.weight.copy_(weights2)

    
    def forward(self, x):
        x1 = self.linear1(x)
        x2 = self.linear2(x)
        return x1, x2
#模型推理=====================
def infer():
    in_features = torch.tensor([1, 2, 3, 4], dtype=torch.float32)
    weights1 = torch.tensor([
        [1, 2, 3, 4],
        [2, 3, 4, 5],
        [3, 4, 5, 6]
    ],dtype=torch.float32)
    weights2 = torch.tensor([
        [2, 3, 4, 5],
        [3, 4, 5, 6],
        [4, 5, 6, 7]
    ],dtype=torch.float32)
    
    model = Model(4, 3, weights1, weights2)
    x1, x2 = model(in_features)
    print("result is: \n")
    print(x1)
    print(x2)
#导出onnx
def export_onnx():
    input    = torch.zeros(1, 1, 1, 4)
    weights1 = torch.tensor([
        [1, 2, 3, 4],
        [2, 3, 4, 5],
        [3, 4, 5, 6]
    ],dtype=torch.float32)
    weights2 = torch.tensor([
        [2, 3, 4, 5],
        [3, 4, 5, 6],
        [4, 5, 6, 7]
    ],dtype=torch.float32)
    model   = Model(4, 3, weights1, weights2)
    model.eval() #添加eval防止权重继续更新

    # pytorch导出onnx的方式，参数有很多，也可以支持动态size
    torch.onnx.export(
        model         = model, 
        args          = (input,),
        f             = "../models/example_two_head.onnx",
        input_names   = ["input0"],
        output_names  = ["output0", "output1"],
        opset_version = 12)
    print("Finished onnx export")


if __name__ == "__main__":
    infer()
    export_onnx()

在这里插入图片描述

3.3 动态batch，example_dynamic_shape.py

import torch
import torch.nn as nn
import torch.onnx

class Model(torch.nn.Module):
    def __init__(self, in_features, out_features, weights, bias=False):
        super().__init__()
        self.linear = nn.Linear(in_features, out_features, bias)
        with torch.no_grad():
            self.linear.weight.copy_(weights)
    
    def forward(self, x):
        x = self.linear(x)
        return x

def infer():
    in_features = torch.tensor([1, 2, 3, 4], dtype=torch.float32)
    weights = torch.tensor([
        [1, 2, 3, 4],
        [2, 3, 4, 5],
        [3, 4, 5, 6]
    ],dtype=torch.float32)

    model = Model(4, 3, weights)
    x = model(in_features)
    print("result of {1, 1, 1 ,4} is ", x.data)

def export_onnx():
    input   = torch.zeros(1, 1, 1, 4)
    weights = torch.tensor([
        [1, 2, 3, 4],
        [2, 3, 4, 5],
        [3, 4, 5, 6]
    ],dtype=torch.float32)
    model   = Model(4, 3, weights)
    model.eval() #添加eval防止权重继续更新

    # pytorch导出onnx的方式，参数有很多，也可以支持动态size
    torch.onnx.export(
        model         = model, 
        args          = (input,),
        f             = "../models/example_dynamic_shape.onnx",
        input_names   = ["input0"],
        output_names  = ["output0"],
        ######################修改了这一部分###########################
        dynamic_axes  = {
            'input0':  {0: 'batch'},
            'output0': {0: 'batch'}
        },#可以自己设置batch值
        ####################################################################
        opset_version = 12)
    print("Finished onnx export")



if __name__ == "__main__":
    infer()
    export_onnx()

在这里插入图片描述

3.5 sample_cbr.py

import torch
import torch.nn as nn
import torch.onnx

class Model(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3)
        self.bn1   = nn.BatchNorm2d(num_features=16)
        self.act1  = nn.ReLU()
    
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.act1(x)
        return x

def export_norm_onnx():
    input   = torch.rand(1, 3, 5, 5)
    model   = Model()
    model.eval()

    # 通过这个案例可以发现onnx导出的时候，其实有一些节点已经被融合了
    # batchNorm不见了
    file    = "../models/sample-cbr.onnx"
    torch.onnx.export(
        model         = model, 
        args          = (input,),
        f             = file,
        input_names   = ["input0"],
        output_names  = ["output0"],
        opset_version = 12)
    print("Finished normal onnx export")

if __name__ == "__main__":
    export_norm_onnx()

在这里插入图片描述
我们可以发现BatchNormalization不见了，说明在torch导出onnx的时候已经被融合了。

3.6 sample_reshape.py

import torch
import torch.nn as nn
import torch.onnx
import onnxsim
import onnx

class Model(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1   = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1)
        self.bn1     = nn.BatchNorm2d(num_features=16)
        self.act1    = nn.ReLU()
        self.conv2   = nn.Conv2d(in_channels=16, out_channels=64, kernel_size=5, padding=2)
        self.bn2     = nn.BatchNorm2d(num_features=64)
        self.act2    = nn.ReLU()
        self.avgpool = nn.AdaptiveAvgPool1d(1)
        self.head    = nn.Linear(in_features=64, out_features=10)
    
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.act1(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.act2(x) 
        x = torch.flatten(x, 2, 3)  # B, C, H, W -> B, C, L (这一个过程产生了shape->slice->concat->reshape这一系列计算节点)


        # b, c, w, h = x.shape
        # x = x.reshape(b, c, w * h)
        # x = x.view(b, c, -1)

        x = self.avgpool(x)         # B, C, L    -> B, C, 1
        x = torch.flatten(x, 1)     # B, C, 1    -> B, C
        x = self.head(x)            # B, C      -> B, 10
        return x

def export_norm_onnx():
    input   = torch.rand(1, 3, 64, 64)
    model   = Model()
    file    = "../models/sample-reshape.onnx"
    torch.onnx.export(
        model         = model, 
        args          = (input,),
        f             = file,
        input_names   = ["input0"],
        output_names  = ["output0"],
        opset_version = 12)
    print("Finished normal onnx export")

    model_onnx = onnx.load(file)

    # 检查导入的onnx model
    onnx.checker.check_model(model_onnx)


    # 使用onnx-simplifier来进行onnx的简化。
    # 可以试试把这个简化给注释掉，看看flatten操作在简化前后的区别
    # onnx中其实会有一些constant value，以及不需要计算图跟踪的节点
    # 大家可以一起从netron中看看这些节点都在干什么
#=======================使用onnx-smilifier把下面代码注释取消即可=============#
    # print(f"Simplifying with onnx-simplifier {onnxsim.__version__}...")
    # model_onnx, check = onnxsim.simplify(model_onnx)
    # assert check, "assert check failed"
    onnx.save(model_onnx, file)

if __name__ == "__main__":
    export_norm_onnx()

在这里插入图片描述

3.7 load_torchvision.py

给出torchvision内置的模型，如何转onnx

import torch
import torchvision
import onnxsim
import onnx
import argparse

def get_model(type, dir):
    if type == "resnet":
        model = torchvision.models.resnet50()
        file  = dir + "resnet50.onnx"
    elif type == "vgg":
        model = torchvision.models.vgg11()
        file  = dir + "vgg11.onnx"
    elif type == "mobilenet":
        model = torchvision.models.mobilenet_v3_small()
        file  = dir + "mobilenetV3.onnx"
    elif type == "efficientnet":
        model = torchvision.models.efficientnet_b0()
        file  = dir + "efficientnetb0.onnx"
    elif type == "efficientnetv2":
        model = torchvision.models.efficientnet_v2_s()
        file  = dir + "efficientnetV2.onnx"
    elif type == "regnet":
        model = torchvision.models.regnet_x_1_6gf()
        file  = dir + "regnet1.6gf.onnx"
    return model, file

def export_norm_onnx(model, file, input):
    
    torch.onnx.export(
        model         = model, 
        args          = (input,),
        f             = file,
        input_names   = ["input0"],
        output_names  = ["output0"],
        opset_version = 12)
    print("Finished normal onnx export")

    model_onnx = onnx.load(file)

    # 检查导入的onnx model
    onnx.checker.check_model(model_onnx)

    # 使用onnx-simplifier来进行onnx的简化。
    print(f"Simplifying with onnx-simplifier {onnxsim.__version__}...")
    model_onnx, check = onnxsim.simplify(model_onnx)
    assert check, "assert check failed"
    onnx.save(model_onnx, file)


def main(args):
    type        = args.type
    dir         = args.dir
    input       = torch.rand(1, 3, 224, 224)
    model, file = get_model(type, dir)

    export_norm_onnx(model, file, input)


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("-t", "--type", type=str, default="vgg")
    parser.add_argument("-d", "--dir", type=str, default="../models/")
    opt = parser.parse_args()
    main(opt)