调试

最新推荐文章于 2022-09-06 13:29:05 发布

牛牛存

最新推荐文章于 2022-09-06 13:29:05 发布

阅读量444

点赞数

分类专栏： tvm TVM Relay

本文链接：https://blog.csdn.net/weixin_42164269/article/details/104291864

版权

TVM Relay 同时被 2 个专栏收录

25 篇文章 20 订阅

订阅专栏

tvm

14 篇文章 5 订阅

订阅专栏

调试

TVM调试器是用于调试TVM计算图执行的接口。它有助于在TVM运行时提供对图形结构和张量值的访问。

调试交换格式

1.计算图

通过Relay以json序列化格式构建的优化图按原样转储。这包含有关图形的全部信息。UX可以直接使用此图，也可以将此图转换为UX可以理解的格式。

图的JSON格式说明如下：

1.【nodes 】节点是json中的占位符或计算节点。节点存储为列表。一个节点包含以下信息

op-操作类型，null表示它是一个占位符/变量/输入节点，而“ tvm_op”表示该节点可以执行
name -节点名称
inputs-此操作的输入位置，Inputs是具有（nodeid，index，version）的元组列表。（可选的）
attrs -包含以下信息的节点的属性

flatten_data -执行前是否需要将该数据展平

func_name -融合函数名称，对应于Relay编译过程在lib中生成的符号。

num_inputs -该节点的输入数量

num_outputs -该节点产生的输出数量

2. 【arg_nodes 】arg_nodes是节点索引的列表，该索引是图形的占位符/变量/输入或常量/参数。

3. 【heads 】heads是条目的列表，作为图的输出。

4. 【node_row_ptr 】node_row_ptr存储正向路径的历史记录，因此您可以在推理任务中跳过构造整个图形的过程。

5. 【attrs 】attrs可以包含版本号或类似的有用信息。

storage_id -存储布局中每个节点的内存插槽ID。
dtype -每个节点的数据类型（枚举值）。
dltype -每个节点顺序的数据类型。
shape -每个节点的形状为k维。
device_index -图形中每个条目的设备分配。

转储图的示例：

{
  "nodes": [                                    # List of nodes
    {
      "op": "null",                             # operation type = null, this is a placeholder/variable/input or constant/param node
      "name": "x",                              # Name of the argument node
      "inputs": []                              # inputs for this node, its none since this is an argument node
    },
    {
      "op": "tvm_op",                           # operation type = tvm_op, this node can be executed
      "name": "relu0",                          # Name of the node
      "attrs": {                                # Attributes of the node
        "flatten_data": "0",                    # Whether this data need to be flattened
        "func_name": "fuse_l2_normalize_relu",  # Fused function name, corresponds to the symbol in the lib generated by compilation process
        "num_inputs": "1",                      # Number of inputs for this node
        "num_outputs": "1"                      # Number of outputs this node produces
      },
      "inputs": [[0, 0, 0]]                     # Position of the inputs for this operation
    }
  ],
  "arg_nodes": [0],                             # Which all nodes in this are argument nodes
  "node_row_ptr": [0, 1, 2],                    # Row indices for faster depth first search
  "heads": [[1, 0, 0]],                         # Position of the output nodes for this operation
  "attrs": {                                    # Attributes for the graph
    "storage_id": ["list_int", [1, 0]],         # memory slot id for each node in the storage layout
    "dtype": ["list_int", [0, 0]],              # Datatype of each node (enum value)
    "dltype": ["list_str", [                    # Datatype of each node in order
        "float32",
        "float32"]],
    "shape": ["list_shape", [                   # Shape of each node k order
        [1, 3, 20, 20],
        [1, 3, 20, 20]]],
    "device_index": ["list_int", [1, 1]],       # Device assignment for each node in order
  }
}

2.张量转储

执行后收到的张量是【tvm.ndarray】类型。所有张量将以序列化格式另存为二进制字节。结果二进制字节可以由API“ load_params”加载。

加载参数的例子：

with open(path_params, “rb”) as fi:
    loaded_params = bytearray(fi.read())

module.load_params(loaded_params)

如何使用调试器？

在【config.cmake】设置【USE_GRAPH_RUNTIME_DEBUG】标志为ON

# Whether enable additional graph debug functions
set(USE_GRAPH_RUNTIME_DEBUG ON)

对tvm进行‘make’，以便它将产生 libtvm_runtime.so
在前端脚本文件中，而不是【from tvm.contrib import graph_runtime】导入【debug_runtime】【 from tvm.contrib.debugger import debug_runtime as graph_runtime】

from tvm.contrib.debugger import debug_runtime as graph_runtime
m = graph_runtime.create(graph, lib, ctx, dump_root="/tmp/tvmdbg")
# set inputs
m.set_input('data', tvm.nd.array(data.astype(dtype)))
m.set_input(**params)
# execute
m.run()
tvm_out = m.get_output(0, tvm.nd.empty(out_shape, dtype)).asnumpy()

在创建运行时的时候输出将转储到临时文件夹【/tmp】或指定文件夹。

样本输出

以下是调试器的示例输出。

Node Name               Ops                                                                  Time(us)   Time(%)  Start Time       End Time         Shape                Inputs  Outputs
---------               ---                                                                  --------   -------  ----------       --------         -----                ------  -------
1_NCHW1c                fuse___layout_transform___4                                          56.52      0.02     15:24:44.177475  15:24:44.177534  (1, 1, 224, 224)     1       1
_contrib_conv2d_nchwc0  fuse__contrib_conv2d_NCHWc                                           12436.11   3.4      15:24:44.177549  15:24:44.189993  (1, 1, 224, 224, 1)  2       1
relu0_NCHW8c            fuse___layout_transform___broadcast_add_relu___layout_transform__    4375.43    1.2      15:24:44.190027  15:24:44.194410  (8, 1, 5, 5, 1, 8)   2       1
_contrib_conv2d_nchwc1  fuse__contrib_conv2d_NCHWc_1                                         213108.6   58.28    15:24:44.194440  15:24:44.407558  (1, 8, 224, 224, 8)  2       1
relu1_NCHW8c            fuse___layout_transform___broadcast_add_relu___layout_transform__    2265.57    0.62     15:24:44.407600  15:24:44.409874  (64, 1, 1)           2       1
_contrib_conv2d_nchwc2  fuse__contrib_conv2d_NCHWc_2                                         104623.15  28.61    15:24:44.409905  15:24:44.514535  (1, 8, 224, 224, 8)  2       1
relu2_NCHW2c            fuse___layout_transform___broadcast_add_relu___layout_transform___1  2004.77    0.55     15:24:44.514567  15:24:44.516582  (8, 8, 3, 3, 8, 8)   2       1
_contrib_conv2d_nchwc3  fuse__contrib_conv2d_NCHWc_3                                         25218.4    6.9      15:24:44.516628  15:24:44.541856  (1, 8, 224, 224, 8)  2       1
reshape1                fuse___layout_transform___broadcast_add_reshape_transpose_reshape    1554.25    0.43     15:24:44.541893  15:24:44.543452  (64, 1, 1)           2       1

牛牛存

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
调试

调试TVM调试器是用于调试TVM计算图执行的接口。它有助于在TVM运行时提供对图形结构和张量值的访问。调试交换格式1.计算图通过中继以json序列化格式构建的优化图按原样转储。这包含有关图形的全部信息。UX可以直接使用此图，也可以将此图转换为UX可以理解的格式。图表JSON格式说明如下1.nodes节点是json中的占位符或计算节点。节点存储为列表。一个节...
复制链接

扫一扫