系列文章目录
第一章 TensorRT优化部署(一)–TensorRT和ONNX基础
第二章 TensorRT优化部署(二)–剖析ONNX架构
第三章 TensorRT优化部署(三)–ONNX注册算子
第四章 TensorRT模型优化部署(四)–Roofline model
第五章 TensorRT模型优化部署(五)–模型优化部署重点注意
第六章 TensorRT模型优化部署(六)–Quantization量化基础(一)
第七章 TensorRT模型优化部署(七)–Quantization量化(PTQ and QAT)(二)
第八章 TensorRT模型优化部署 (八)–模型剪枝Pruning
文章目录
前言
自学视频笔记,专题内容后续有补充。
一、模型部署目的
模型训练 (Training)
{ 精度越高越好 把模型做的深一点,宽一点 使用丰富的 D a t a a u g m e n t a t i o n 使用各种 T r a i n i n g t r i c k \left\{ \begin{array}{l} 精度越高越好 \\ \\ 把模型做的深一点,宽一点 \\ \\ 使用丰富的\quad Data\quad augmentation\\ \\ 使用各种\quad Training\quad trick\end{array}\right. ⎩ ⎨ ⎧精度越高越好把模型做的深一点,宽一点使用丰富的Dataaugmentation使用各种Trainingtrick
模型部署 (Deploy)
{ 以精度不变,或者精度掉点很小的情况下尽量压缩模型 减少计算量 减少 m e m o r y a c c e s s 提高计算密度 \left\{ \begin{array}{l} 以精度不变,或者精度掉点很小的情况下尽量压缩模型 \\ \\ 减少计算量 \\ \\ 减少\quad memory\quad access\\ \\ 提高计算密度\end{array}\right. ⎩ ⎨ ⎧以精度不变,或者精度掉点很小的情况下尽量压缩模型减少计算量减少memoryaccess提高计算密度
自动驾驶中模型部署所关注的东西
- RealTime(实时性)
- Power Consumption(消耗电力)(我们需要的算力越大,消耗的电力越大)
- Long range Accuracy(远距离)
Attention:
TensorRT虽然很方便,但是不能过分依赖。需要结合部署的硬件特性,做一些Benchmark和Profiling,学习使用一些Profile tools去分析模型的计算瓶颈,分析预测优化策略和实际优化策略产生分歧的原因。
- Benchmark通常指对某个软件系统或者部分系统进行性能测试,目的是评估其性能表现。
- Profiling则是指对代码的性能进行分析和优化。
二、TensorRT的模块
TensorRT内部各类优化策略如下图:
2.1 Layer fusion(层融合)
层融合可以减少启动kernel的开销与memort操作,从而提高效率同时,有些计算可以通过层融合优化后,跟其他计算合并。
1. Vertical layer fusion(垂直层融合)
conv+BN+ReLu进行层融合
Step1: Step2:
注意:conv + BN + ReLU的计算时间和conv的计算时间差不多
2. Horizontal layer fusion(水平层融合)
Step3:
2.2 Kernel auto-tuning
TensorRT内部对于同一个层使用各种不同kernel函数进行性能测试
比如对于FC层中的矩阵乘法,根据tile size有很多中kernel function 。(e.g. 32x32, 32x64, 64x64, 64x128, 128x128,针对不同硬件有不同策略)
2.3 Quantization
量化,压缩模型的很重要的策略,将单精度类型(FP32)训练权重转变为半精度(FP16)或者整型(INT8, INT4)
三、ONNX是什么?
ONNX是一种神经网络的格式,采用Protobuf(*)二进制形式进行序列化模型。Protobuf会根据用于定义的数据结构来进行序列化存储。
下面是一些简单的例子,代码链接:
3.1 example.py
import torch
import torch.nn as nn
import torch.onnx
class Model(torch.nn.Module):
def __init__(self, in_features, out_features, weights, bias=False):
super().__init__()
self.linear = nn.Linear(in_features, out_features, bias)
with torch.no_grad():
self.linear.weight.copy_(weights)
def forward(self, x):
x = self.linear(x)
return x
def infer():
in_features = torch.tensor([1, 2, 3, 4], dtype=torch.float32)
weights = torch.tensor([
[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]
],dtype=torch.float32)
model = Model(4, 3, weights)
x = model(in_features)
print("result is: ", x)
def export_onnx():
input = torch.zeros(1, 1, 1, 4)
weights = torch.tensor([
[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]
],dtype=torch.float32)
model = Model(4, 3, weights)
model.eval() #添加eval防止权重继续更新
# pytorch导出onnx的方式,参数有很多,也可以支持动态size
# 重点部分===========================================
torch.onnx.export(
model = model,
args = (input,),
f = "../models/example.onnx",
input_names = ["input0"],
output_names = ["output0"],
opset_version = 12) #onnx指令集版本
print("Finished onnx export")
if __name__ == "__main__":
infer()
export_onnx()
输出结果
使用Netron打开
3.2 example_two_head.py 两个输出
import torch
import torch.nn as nn
import torch.onnx
#模型定义=================
class Model(torch.nn.Module):
def __init__(self, in_features, out_features, weights1, weights2, bias=False):
super().__init__()
self.linear1 = nn.Linear(in_features, out_features, bias)
self.linear2 = nn.Linear(in_features, out_features, bias)
with torch.no_grad():
self.linear1.weight.copy_(weights1)
self.linear2.weight.copy_(weights2)
def forward(self, x):
x1 = self.linear1(x)
x2 = self.linear2(x)
return x1, x2
#模型推理=====================
def infer():
in_features = torch.tensor([1, 2, 3, 4], dtype=torch.float32)
weights1 = torch.tensor([
[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]
],dtype=torch.float32)
weights2 = torch.tensor([
[2, 3, 4, 5],
[3, 4, 5, 6],
[4, 5, 6, 7]
],dtype=torch.float32)
model = Model(4, 3, weights1, weights2)
x1, x2 = model(in_features)
print("result is: \n")
print(x1)
print(x2)
#导出onnx
def export_onnx():
input = torch.zeros(1, 1, 1, 4)
weights1 = torch.tensor([
[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]
],dtype=torch.float32)
weights2 = torch.tensor([
[2, 3, 4, 5],
[3, 4, 5, 6],
[4, 5, 6, 7]
],dtype=torch.float32)
model = Model(4, 3, weights1, weights2)
model.eval() #添加eval防止权重继续更新
# pytorch导出onnx的方式,参数有很多,也可以支持动态size
torch.onnx.export(
model = model,
args = (input,),
f = "../models/example_two_head.onnx",
input_names = ["input0"],
output_names = ["output0", "output1"],
opset_version = 12)
print("Finished onnx export")
if __name__ == "__main__":
infer()
export_onnx()
3.3 动态batch,example_dynamic_shape.py
import torch
import torch.nn as nn
import torch.onnx
class Model(torch.nn.Module):
def __init__(self, in_features, out_features, weights, bias=False):
super().__init__()
self.linear = nn.Linear(in_features, out_features, bias)
with torch.no_grad():
self.linear.weight.copy_(weights)
def forward(self, x):
x = self.linear(x)
return x
def infer():
in_features = torch.tensor([1, 2, 3, 4], dtype=torch.float32)
weights = torch.tensor([
[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]
],dtype=torch.float32)
model = Model(4, 3, weights)
x = model(in_features)
print("result of {1, 1, 1 ,4} is ", x.data)
def export_onnx():
input = torch.zeros(1, 1, 1, 4)
weights = torch.tensor([
[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]
],dtype=torch.float32)
model = Model(4, 3, weights)
model.eval() #添加eval防止权重继续更新
# pytorch导出onnx的方式,参数有很多,也可以支持动态size
torch.onnx.export(
model = model,
args = (input,),
f = "../models/example_dynamic_shape.onnx",
input_names = ["input0"],
output_names = ["output0"],
######################修改了这一部分###########################
dynamic_axes = {
'input0': {0: 'batch'},
'output0': {0: 'batch'}
},#可以自己设置batch值
####################################################################
opset_version = 12)
print("Finished onnx export")
if __name__ == "__main__":
infer()
export_onnx()
3.5 sample_cbr.py
import torch
import torch.nn as nn
import torch.onnx
class Model(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3)
self.bn1 = nn.BatchNorm2d(num_features=16)
self.act1 = nn.ReLU()
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.act1(x)
return x
def export_norm_onnx():
input = torch.rand(1, 3, 5, 5)
model = Model()
model.eval()
# 通过这个案例可以发现onnx导出的时候,其实有一些节点已经被融合了
# batchNorm不见了
file = "../models/sample-cbr.onnx"
torch.onnx.export(
model = model,
args = (input,),
f = file,
input_names = ["input0"],
output_names = ["output0"],
opset_version = 12)
print("Finished normal onnx export")
if __name__ == "__main__":
export_norm_onnx()
我们可以发现BatchNormalization不见了,说明在torch导出onnx的时候已经被融合了。
3.6 sample_reshape.py
import torch
import torch.nn as nn
import torch.onnx
import onnxsim
import onnx
class Model(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1)
self.bn1 = nn.BatchNorm2d(num_features=16)
self.act1 = nn.ReLU()
self.conv2 = nn.Conv2d(in_channels=16, out_channels=64, kernel_size=5, padding=2)
self.bn2 = nn.BatchNorm2d(num_features=64)
self.act2 = nn.ReLU()
self.avgpool = nn.AdaptiveAvgPool1d(1)
self.head = nn.Linear(in_features=64, out_features=10)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.act1(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.act2(x)
x = torch.flatten(x, 2, 3) # B, C, H, W -> B, C, L (这一个过程产生了shape->slice->concat->reshape这一系列计算节点)
# b, c, w, h = x.shape
# x = x.reshape(b, c, w * h)
# x = x.view(b, c, -1)
x = self.avgpool(x) # B, C, L -> B, C, 1
x = torch.flatten(x, 1) # B, C, 1 -> B, C
x = self.head(x) # B, C -> B, 10
return x
def export_norm_onnx():
input = torch.rand(1, 3, 64, 64)
model = Model()
file = "../models/sample-reshape.onnx"
torch.onnx.export(
model = model,
args = (input,),
f = file,
input_names = ["input0"],
output_names = ["output0"],
opset_version = 12)
print("Finished normal onnx export")
model_onnx = onnx.load(file)
# 检查导入的onnx model
onnx.checker.check_model(model_onnx)
# 使用onnx-simplifier来进行onnx的简化。
# 可以试试把这个简化给注释掉,看看flatten操作在简化前后的区别
# onnx中其实会有一些constant value,以及不需要计算图跟踪的节点
# 大家可以一起从netron中看看这些节点都在干什么
#=======================使用onnx-smilifier把下面代码注释取消即可=============#
# print(f"Simplifying with onnx-simplifier {onnxsim.__version__}...")
# model_onnx, check = onnxsim.simplify(model_onnx)
# assert check, "assert check failed"
onnx.save(model_onnx, file)
if __name__ == "__main__":
export_norm_onnx()
3.7 load_torchvision.py
给出torchvision内置的模型,如何转onnx
import torch
import torchvision
import onnxsim
import onnx
import argparse
def get_model(type, dir):
if type == "resnet":
model = torchvision.models.resnet50()
file = dir + "resnet50.onnx"
elif type == "vgg":
model = torchvision.models.vgg11()
file = dir + "vgg11.onnx"
elif type == "mobilenet":
model = torchvision.models.mobilenet_v3_small()
file = dir + "mobilenetV3.onnx"
elif type == "efficientnet":
model = torchvision.models.efficientnet_b0()
file = dir + "efficientnetb0.onnx"
elif type == "efficientnetv2":
model = torchvision.models.efficientnet_v2_s()
file = dir + "efficientnetV2.onnx"
elif type == "regnet":
model = torchvision.models.regnet_x_1_6gf()
file = dir + "regnet1.6gf.onnx"
return model, file
def export_norm_onnx(model, file, input):
torch.onnx.export(
model = model,
args = (input,),
f = file,
input_names = ["input0"],
output_names = ["output0"],
opset_version = 12)
print("Finished normal onnx export")
model_onnx = onnx.load(file)
# 检查导入的onnx model
onnx.checker.check_model(model_onnx)
# 使用onnx-simplifier来进行onnx的简化。
print(f"Simplifying with onnx-simplifier {onnxsim.__version__}...")
model_onnx, check = onnxsim.simplify(model_onnx)
assert check, "assert check failed"
onnx.save(model_onnx, file)
def main(args):
type = args.type
dir = args.dir
input = torch.rand(1, 3, 224, 224)
model, file = get_model(type, dir)
export_norm_onnx(model, file, input)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("-t", "--type", type=str, default="vgg")
parser.add_argument("-d", "--dir", type=str, default="../models/")
opt = parser.parse_args()
main(opt)