TVM User Tutorial -- Quick Start Tutorial for Compiling Deep Learning Models

最新推荐文章于 2024-08-05 16:02:59 发布

姆克儿

最新推荐文章于 2024-08-05 16:02:59 发布

阅读量299

点赞数

分类专栏： TVM文档翻译文章标签： python 深度学习 TVM

原文链接：https://tvm.apache.org/docs/tutorial/relay_quick_start.html

版权

TVM文档翻译专栏收录该内容

12 篇文章 2 订阅

订阅专栏

Author: Yao Wang, Truman Tian

这个例子展示了如何使用 Relay python 前端构建一个神经网络，并为带有 TVM 的 Nvidia GPU 生成一个运行时库。请注意，您需要在启用 cuda 和 llvm 的情况下构建 TVM。

TVM 支持的硬件后端概述

下图显示了 TVM 当前支持的硬件后端：
在这里插入图片描述在本教程中，我们将选择 cuda 和 llvm 作为目标后端。首先，让我们导入 Relay 和 TVM。

import numpy as np

from tvm import relay
from tvm.relay import testing
import tvm
from tvm import te
from tvm.contrib import graph_executor
import tvm.testing

在中Realy中定义神经网络

首先，让我们定义一个带有Relay python 前端的神经网络。为简单起见，我们将在 Relay 中使用预定义的 resnet-18 网络。参数使用 Xavier 初始化程序进行初始化。 Relay 还支持其他模型格式，例如 MXNet、CoreML、ONNX 和 Tensorflow。
在本教程中，我们假设我们将在我们的设备上进行推理，并且批量大小设置为 1。输入图像是大小为 224 * 224 的 RGB 彩色图像。我们可以调用 tvm.relay.expr.TupleWrapper.astext() 显示网络结构。

batch_size = 1
num_class = 1000
image_shape = (3, 224, 224)
data_shape = (batch_size,) + image_shape
out_shape = (batch_size, num_class)

mod, params = relay.testing.resnet.get_workload(
    num_layers=18, batch_size=batch_size, image_shape=image_shape
)

# set show_meta_data=True if you want to show meta data
print(mod.astext(show_meta_data=False))

编译

下一步是使用 Relay/TVM 管道编译模型。用户可以指定编译的优化级别。目前这个值可以是0到3。优化pass包括算子融合、预计算、布局变换等。
relay.build() 返回三个组件：json 格式的执行图，目标硬件上专门为此图编译的函数的 TVM 模块库，以及模型的参数 blob。在编译过程中，Relay 进行图级优化，而 TVM 进行张量级优化，从而为模型服务提供优化的运行时模块。
我们将首先为 Nvidia GPU 编译。在幕后，relay.build() 首先进行了一些图级优化，例如剪枝、融合等，然后将算子（即优化图的节点）注册到 TVM 实现以生成 tvm.module。为了生成模块库，TVM 将首先将高层 IR 转换为指定目标后端的低层固有 IR，在本例中为 CUDA。然后机器代码将作为模块库生成。

opt_level = 3
target = tvm.target.cuda()
with tvm.transform.PassContext(opt_level=opt_level):
    lib = relay.build(mod, target, params=params)

Out:
/workspace/python/tvm/target/target.py:282: UserWarning: Try specifying cuda arch by adding 'arch=sm_xx' to your target.
  warnings.warn("Try specifying cuda arch by adding 'arch=sm_xx' to your target.")

运行生成库

现在我们可以创建图形执行器并在 Nvidia GPU 上运行模块

# create random input
dev = tvm.cuda()
data = np.random.uniform(-1, 1, size=data_shape).astype("float32")
# create module
module = graph_executor.GraphModule(lib["default"](dev))
# set input and parameters
module.set_input("data", data)
# run
module.run()
# get output
out = module.get_output(0, tvm.nd.empty(out_shape)).numpy()

# Print first 10 elements of output
print(out.flatten()[0:10])

Out:
[0.00089283 0.00103331 0.0009094  0.00102275 0.00108751 0.00106737
 0.00106262 0.00095838 0.00110792 0.00113151]

保存和加载编译模块

我们还可以将图形、库和参数保存到文件中，然后在部署环境中加载它们。

# save the graph, lib and params into separate files
from tvm.contrib import utils

temp = utils.tempdir()
path_lib = temp.relpath("deploy_lib.tar")
lib.export_library(path_lib)
print(temp.listdir())

Out:
['deploy_lib.tar']

# load the module back.
loaded_lib = tvm.runtime.load_module(path_lib)
input_data = tvm.nd.array(data)

module = graph_executor.GraphModule(loaded_lib["default"](dev))
module.run(data=input_data)
out_deploy = module.get_output(0).numpy()

# Print first 10 elements of output
print(out_deploy.flatten()[0:10])

# check whether the output from deployed module is consistent with original one
tvm.testing.assert_allclose(out_deploy, out, atol=1e-5)

Out:
[0.00089283 0.00103331 0.0009094  0.00102275 0.00108751 0.00106737
 0.00106262 0.00095838 0.00110792 0.00113151]

姆克儿

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录