【TVM】——对比tensorrt速度评测,他香吗?

简介
在推理的时候,如果使用nvidia的显卡,网上很多人都说使用tensorrt是不二的选择,直到TVM的出现,大家开始各种推崇,同时TVM也是延续了tensorrt的理念——加速模型的推理速度,只是做了tensorrt没有做的,加速了不同平台x86-64,armv7,arm64等,同时TVM是开源的,这点比起tensorrt就很香了。

那TVM开源,支持多后端这点确实很香。那他的推理速度和tensorrt比起来来到底能不能打呢?

这里我们不分析二者的加速原理,只是简单用resnet50测试二者的推理速度。看看二者谁更能打。

0.硬件

这里写下关键的硬件,以便有个直观的:

  • tensorrt5.1.5
  • 显卡1070/8G
  • TVM-0.8

1. tensorrt安装与测试

1.1 安装tensorrt

参考:ubuntu16.04安装tensorrt

1.2 resnet50评测
我们对tensorrt和TVM二者统一采用python的接口进行调用。

  • tensorrt resnet50测试的脚本在:/home/data/CM/profile/TensorRT-5.1.5.0/samples/python/introductory_parser_samples/onnx_resnet50.py
  • 只对:输入数据,推理,获得输出三步进行计时,其他比如预处理,模型解析等不计时
  • 循环100次
  • 输入尺寸为224*224

Code

def main():
    # Set the data path to the directory that contains the trained models and test images for inference.
    data_path, data_files = common.find_sample_data(description="Runs a ResNet50 network with a TensorRT inference engine.", subfolder="resnet50", find_files=["binoculars.jpeg", "reflex_camera.jpeg", "tabby_tiger_cat.jpg", ModelData.MODEL_PATH, "class_labels.txt"])
    # Get test images, models and labels.
    test_images = data_files[0:3]
    onnx_model_file, labels_file = data_files[3:]
    labels = open(labels_file, 'r').read().split('\n')

    # Build a TensorRT engine.
    with build_engine_onnx(onnx_model_file) as engine:
        # Inference is the same regardless of which parser is used to build the engine, since the model architecture is the same.
        # Allocate buffers and create a CUDA stream.
        h_input, d_input, h_output, d_output, stream = allocate_buffers(engine)
        # Contexts are used to perform inference.
        with engine.create_execution_context() as context:
            # Load a normalized test case into the host input page-locked buffer.
            test_image = random.choice(test_images)
            test_case = load_normalized_test_case(test_image, h_input)
            # Run the engine. The output will be a 1D tensor of length 1000, where each value represents the
            # probability that the image corresponds to that label
            for i in range(100):
                start = time.time()
                do_inference(context, h_input, d_input, h_output, d_output, stream)
                end   = time.time()
                print("tensorrt resnet50 used time is:{}".format(end - start))
            # We use the highest probability as our prediction. Its index corresponds to the predicted label.
            pred = labels[np.argmax(h_output)]
            if "_".join(pred.split()) in os.path.splitext(os.path.basename(test_case))[0]:
                print("Correctly recognized " + test_case + " as " + pred)
            else:
                print("Incorrectly recognized " + test_case + " as " + pred)

output

tensorrt resnet50 used time is:0.0034933090209960938
tensorrt resnet50 used time is:0.003971099853515625
tensorrt resnet50 used time is:0.0035047531127929688
tensorrt resnet50 used time is:0.004241943359375
tensorrt resnet50 used time is:0.0034859180450439453
tensorrt resnet50 used time is:0.003471851348876953
tensorrt resnet50 used time is:0.003989696502685547
tensorrt resnet50 used time is:0.0036356449127197266
tensorrt resnet50 used time is:0.00417017936706543
tensorrt resnet50 used time is:0.003609180450439453
tensorrt resnet50 used time is:0.004082918167114258
tensorrt resnet50 used time is:0.0036220550537109375
tensorrt resnet50 used time is:0.004175424575805664
tensorrt resnet50 used time is:0.0037670135498046875
tensorrt resnet50 used time is:0.0036225318908691406

大概在4ms左右

2. TVM测试

2.1 TVM安装
请参考:Install TVM, 有问题再留言吧

2.2 测试
这里我们采用 tensorrt 刚才的ResNet50.onnx,保证模型的一致性。

Code:

import tvm
from tvm import relay

import numpy as np

from tvm.contrib.download import download_testdata

# PyTorch imports
import torch
import torchvision
import time
import onnx

model_path = "./ResNet50.onnx"
scripted_model = onnx.load_model(model_path)


from PIL import Image

img_url = "https://github.com/dmlc/mxnet.js/blob/main/data/cat.png?raw=true"
img_path = download_testdata(img_url, "cat.png", module="data")
img = Image.open(img_path).resize((224, 224))

# Preprocess the image and convert to tensor
from torchvision import transforms

my_preprocess = transforms.Compose(
    [
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ]
)
img = my_preprocess(img)
img = np.expand_dims(img, 0)


input_name = "gpu_0/data_0"
shape_list = {input_name: img.shape}
mod, params = relay.frontend.from_onnx(scripted_model, shape_list)

opt_level = 3
target = tvm.target.cuda()
ctx = tvm.gpu()
with tvm.transform.PassContext(opt_level=opt_level):
    lib = relay.build(mod, target, params=params)


from tvm.contrib import graph_runtime


dtype = "float32"
m = graph_runtime.GraphModule(lib["default"](ctx))

for i in range(500):
    start = time.time()
    # Set inputs
    m.set_input(input_name, tvm.nd.array(img.astype(dtype)))
    # Execute
    m.run()
    # Get outputs
    tvm_output = m.get_output(0)
    end = time.time()

    print("{} used time is:{}".format(model_name, end-start))

输出:

ResNet50 used time is:0.0061075687408447266
ResNet50 used time is:0.005687713623046875
ResNet50 used time is:0.005716085433959961
ResNet50 used time is:0.005555152893066406
ResNet50 used time is:0.0057582855224609375
ResNet50 used time is:0.0059239864349365234
ResNet50 used time is:0.005617380142211914
ResNet50 used time is:0.0059549808502197266
ResNet50 used time is:0.006211042404174805
ResNet50 used time is:0.006203889846801758

大概在~6ms左右

other

  1. TVM在模型优化时输出了:
WARNING:autotvm:Cannot find config for target=cuda -keys=cuda,gpu -max_num_threads=1024 -model=unknown -thread_warp_size=32, workload=('dense_small_batch.cuda', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32'). A fallback configuration is used, which may bring great performance regression.

从TVM的 discuss可以了解到 TVM团队针对resnet50做了优化的,而这里的分类层,应该没有做到极致的优化,会不会是这里问题所以比tensorrt慢了~2ms呢?

  1. 如果用TVM跑一个其他模型,比如最新的hrnet,就会报很多类似上面的信息,所以需要学习TVM的优化,再针对自己的模型去优化。而tensorrt只要模型解析,转换上正确。优化部分都是一个黑盒。

总结

  • tensorrt在nvidia的GPU上在易用,速度上还是首选
  • TVM在开源,多平台上是优势。
  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值