【TVM】——对比tensorrt速度评测，他香吗？

最新推荐文章于 2024-05-11 07:54:39 发布

农夫山泉2号

最新推荐文章于 2024-05-11 07:54:39 发布

阅读量3.7k

点赞数 2

分类专栏：嵌入式AI 文章标签： tensorrt tvm resnet50 nvidia cuda

本文链接：https://blog.csdn.net/u011622208/article/details/110963127

版权

嵌入式AI 专栏收录该内容

157 篇文章 50 订阅

订阅专栏

简介
在推理的时候，如果使用nvidia的显卡，网上很多人都说使用tensorrt是不二的选择，直到TVM的出现，大家开始各种推崇，同时TVM也是延续了tensorrt的理念——加速模型的推理速度，只是做了tensorrt没有做的，加速了不同平台x86-64，armv7，arm64等，同时TVM是开源的，这点比起tensorrt就很香了。

那TVM开源，支持多后端这点确实很香。那他的推理速度和tensorrt比起来来到底能不能打呢？

这里我们不分析二者的加速原理，只是简单用resnet50测试二者的推理速度。看看二者谁更能打。

0.硬件

这里写下关键的硬件，以便有个直观的：

tensorrt5.1.5
显卡1070/8G
TVM-0.8

1. tensorrt安装与测试

1.1 安装tensorrt

参考：ubuntu16.04安装tensorrt

1.2 resnet50评测
我们对tensorrt和TVM二者统一采用python的接口进行调用。

tensorrt resnet50测试的脚本在：/home/data/CM/profile/TensorRT-5.1.5.0/samples/python/introductory_parser_samples/onnx_resnet50.py
只对：输入数据，推理，获得输出三步进行计时，其他比如预处理，模型解析等不计时
循环100次
输入尺寸为224*224

Code

def main():
    # Set the data path to the directory that contains the trained models and test images for inference.
    data_path, data_files = common.find_sample_data(description="Runs a ResNet50 network with a TensorRT inference engine.", subfolder="resnet50", find_files=["binoculars.jpeg", "reflex_camera.jpeg", "tabby_tiger_cat.jpg", ModelData.MODEL_PATH, "class_labels.txt"])
    # Get test images, models and labels.
    test_images = data_files[0:3]
    onnx_model_file, labels_file = data_files[3:]
    labels = open(labels_file, 'r').read().split('\n')

    # Build a TensorRT engine.
    with build_engine_onnx(onnx_model_file) as engine:
        # Inference is the same regardless of which parser is used to build the engine, since the model architecture is the same.
        # Allocate buffers and create a CUDA stream.
        h_input, d_input, h_output, d_output, stream = allocate_buffers(engine)
        # Contexts are used to perform inference.
        with engine.create_execution_context() as context:
            # Load a normalized test case into the host input page-locked buffer.
            test_image = random.choice(test_images)
            test_case = load_normalized_test_case(test_image, h_input)
            # Run the engine. The output will be a 1D tensor of length 1000, where each value represents the
            # probability that the image corresponds to that label
            for i in range(100):
                start = time.time()
                do_inference(context, h_input, d_input, h_output, d_output, stream)
                end   = time.time()
                print("tensorrt resnet50 used time is:{}".format(end - start))
            # We use the highest probability as our prediction. Its index corresponds to the predicted label.
            pred = labels[np.argmax(h_output)]
            if "_".join(pred.split()) in os.path.splitext(os.path.basename(test_case))[0]:
                print("Correctly recognized " + test_case + " as " + pred)
            else:
                print("Incorrectly recognized " + test_case + " as " + pred)

output

tensorrt resnet50 used time is:0.0034933090209960938
tensorrt resnet50 used time is:0.003971099853515625
tensorrt resnet50 used time is:0.0035047531127929688
tensorrt resnet50 used time is:0.004241943359375
tensorrt resnet50 used time is:0.0034859180450439453
tensorrt resnet50 used time is:0.003471851348876953
tensorrt resnet50 used time is:0.003989696502685547
tensorrt resnet50 used time is:0.0036356449127197266
tensorrt resnet50 used time is:0.00417017936706543
tensorrt resnet50 used time is:0.003609180450439453
tensorrt resnet50 used time is:0.004082918167114258
tensorrt resnet50 used time is:0.0036220550537109375
tensorrt resnet50 used time is:0.004175424575805664
tensorrt resnet50 used time is:0.0037670135498046875
tensorrt resnet50 used time is:0.0036225318908691406

大概在4ms左右

2. TVM测试

2.1 TVM安装
请参考：Install TVM，有问题再留言吧

2.2 测试
这里我们采用 tensorrt 刚才的ResNet50.onnx，保证模型的一致性。

Code：

import tvm
from tvm import relay

import numpy as np

from tvm.contrib.download import download_testdata

# PyTorch imports
import torch
import torchvision
import time
import onnx

model_path = "./ResNet50.onnx"
scripted_model = onnx.load_model(model_path)


from PIL import Image

img_url = "https://github.com/dmlc/mxnet.js/blob/main/data/cat.png?raw=true"
img_path = download_testdata(img_url, "cat.png", module="data")
img = Image.open(img_path).resize((224, 224))

# Preprocess the image and convert to tensor
from torchvision import transforms

my_preprocess = transforms.Compose(
    [
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ]
)
img = my_preprocess(img)
img = np.expand_dims(img, 0)


input_name = "gpu_0/data_0"
shape_list = {input_name: img.shape}
mod, params = relay.frontend.from_onnx(scripted_model, shape_list)

opt_level = 3
target = tvm.target.cuda()
ctx = tvm.gpu()
with tvm.transform.PassContext(opt_level=opt_level):
    lib = relay.build(mod, target, params=params)


from tvm.contrib import graph_runtime


dtype = "float32"
m = graph_runtime.GraphModule(lib["default"](ctx))

for i in range(500):
    start = time.time()
    # Set inputs
    m.set_input(input_name, tvm.nd.array(img.astype(dtype)))
    # Execute
    m.run()
    # Get outputs
    tvm_output = m.get_output(0)
    end = time.time()

    print("{} used time is:{}".format(model_name, end-start))

输出：

ResNet50 used time is:0.0061075687408447266
ResNet50 used time is:0.005687713623046875
ResNet50 used time is:0.005716085433959961
ResNet50 used time is:0.005555152893066406
ResNet50 used time is:0.0057582855224609375
ResNet50 used time is:0.0059239864349365234
ResNet50 used time is:0.005617380142211914
ResNet50 used time is:0.0059549808502197266
ResNet50 used time is:0.006211042404174805
ResNet50 used time is:0.006203889846801758

大概在~6ms左右

other

TVM在模型优化时输出了：

WARNING:autotvm:Cannot find config for target=cuda -keys=cuda,gpu -max_num_threads=1024 -model=unknown -thread_warp_size=32, workload=('dense_small_batch.cuda', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32'). A fallback configuration is used, which may bring great performance regression.

从TVM的 discuss可以了解到 TVM团队针对resnet50做了优化的，而这里的分类层，应该没有做到极致的优化，会不会是这里问题所以比tensorrt慢了～2ms呢？

如果用TVM跑一个其他模型，比如最新的hrnet，就会报很多类似上面的信息，所以需要学习TVM的优化，再针对自己的模型去优化。而tensorrt只要模型解析，转换上正确。优化部分都是一个黑盒。

总结

tensorrt在nvidia的GPU上在易用，速度上还是首选
TVM在开源，多平台上是优势。

农夫山泉2号

关注

2
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
【TVM】——对比tensorrt速度评测，他香吗？

简介在推理的时候，如果使用nvidia的显卡，网上很多人都说使用tensorrt是不二的选择，直到TVM的出现，大家开始各种推崇，同时TVM也是延续了tensorrt的理念——加速模型的推理速度，只是做了tensorrt没有做的，加速了不同平台x86-64，armv7，arm64等，同时TVM是开源的，这点比起tensorrt就很香了。那TVM开源，支持多后端这点确实很香。那他的推理速度和tensorrt比起来来到底能不能打呢？这里我们不分析二者的加速原理，只是简单用resnet50测试二者的推理速度
复制链接

扫一扫

专栏目录