TVM学习（七）算子

最新推荐文章于 2024-03-17 23:08:30 发布

yhwang-hub

最新推荐文章于 2024-03-17 23:08:30 发布

阅读量1.1k

点赞数

分类专栏： TVM

原文链接：https://zhuanlan.zhihu.com/p/351403985

版权

TVM 专栏收录该内容

41 篇文章 31 订阅 ¥19.90 ¥99.00

订阅专栏

超级会员免费看

本文深入探讨TVM中的算子实现，从relay的conv2d算子转换开始，详细阐述了AttrCvt的工作原理，以及C++层面的算子注册和实现。接着介绍了TOPI算子库，包括如何通过te表达计算图结构，并以compute算子为例解析其内部机制。

摘要由CSDN通过智能技术生成

relay算子
上一章梳理了一遍TVM前端流程，前端完成了tensorflow算子到TVM算子的转换。这一章CNN网络中最普遍的卷积运算为例，看一下relay算子的表示。在python/tvm/relay/frontend/tensorflow.py文件中convert_map有：

_convert_map = {
…
'Conv2D'                            : _conv('conv'),
…
}

在_conv函数中会根据layout对weights，inputs，outputs进行重排，然后调用AttrCvt来获得op。

out = AttrCvt(op_name=_dimension_picker('conv',
                                      surfix="_transpose" if opname == 'conv_transpose' else ""),
            ignores=['explicit_paddings'],
            transforms={
                'kernel_shape': 'kernel_size',
                'data_format': 'data_layout',
                'dilations': ('dilation', (0, 0)),
                'group': ('groups', 1)},
            custom_check=_dimension_constraint())([inputs_data, inputs[1]], attr)

AttrCvt的调用位于python/tvm/relay/frontend/common.py文件夹中，根据注释看出这个类主要实现算子转换，实际上是根据传入的op_name映射到relay的算子。首先会对传入的attrs进行检测，如果有不符合的属性会报错或者warning，如果属性有相应转换就进行新属性替换，最后调用get_relay_op。在这个函数中可以看到，依据op_name在全局字典_op中搜索相应op，然后返回。所有的op都位于python/tvm/relay/op包中，conv在op/nn中定义。nn.py中包含如下调用关系：conv2d -> _make.conv2d()。在_make.py中实际上实现了C++类到python类的注册，就是一行代码：

tvm._ffi._init_api("relay.op.nn._make", __name__)

_init_api函数在python/tvm/_ffi/registry.py中，我们可以看一下：

def _init_api(namespace, target_module_name=None):
 
    target_module_name = (
        target_module_name if target_module_name else namespace)
    if namespace.startswith("tvm."):
        _init_api_prefix(target_module_name, namespace[4:])
    else:
        _init_api_prefix(target_module_name, namespace)
 
 
def _init_api_prefix(module_name, prefix):
    module = sys.modules[module_name]
 
    for name in list_global_func_names():
        if not name.startswith(prefix):
            continue
 
        fname = name[len(prefix)+1:]
        target_module = module
 
        if fname.find(".") != -1:
            continue
        f = get_global_func(name)
        ff = _get_api(f)
        ff.__name__ = fname
        ff.__doc__ = ("TVM PackedFunc %s. " % fname)
        setattr(target_module, ff.__name__, ff)

实际上是通过名字获取C++注册的函数，然后设置给到_make.py文件中。这样就相当于_make.py文件中定义了conv2d相关的函数了。

TVM_REGISTER_GLOBAL("relay.op.nn._make.conv2d").set_body_typed([](Expr data, Expr weight, Array<IndexExpr> strides, Array<IndexExpr> padding,  Array<IndexExpr> dilation, int groups, IndexExpr channels, Array<IndexExpr> kernel_size, String data_layout, String kernel_layout, String out_layout, DataType out_dtype) {
      return MakeConv<Conv2DAttrs>(data, weight, strides, padding, dilation, groups, channels, kernel_size, data_layout, kernel_layout, out_layout, out_dtype, "nn.conv2d");
    });

RELAY_REGISTER_OP("nn.conv2d")
    .describe(R"code(2D convolution layer (e.g. spatial convolution over images).
 
This layer creates a convolution kernel that is convolved
with the layer input to produce a tensor of outputs.
 
- **data**: This depends on the `layout` parameter. Input is 4D array of shape
            (batch_size, in_channels, height, width) if `layout` is `NCHW`.
- **weight**: (channels, in_channels, kernel_size[0], kernel_size[1])
- **out**:  This depends on the `layout` parameter. Output is 4D array of shape
            (batch_size, channels, out_height, out_width) if `layout` is `NCHW`.
 
)code" TVM_ADD_FILELINE)
    .set_attrs_type<Conv2DAttrs>()
    .set_num_inputs(2)
    .add_argument("data", "Tensor", "The input tensor.")
    .add_argument("weight", "Tensor", "The weight tensor.")
    .set_support_level(2)
    .add_type_rel("Conv2D", Conv2DRel<Conv2DAttrs>)
.set_attr<FInferCorrectLayout>("FInferCorrectLayout", ConvInferCorrectLayout<Conv2DAttrs>);

接下来我们转移到C++中，看一看conv算子的实现。

C++通过宏定义TVM_REGISTER_GLOBAL将算子注册到一个全局对象中。可以看一下这个宏定义：

#define TVM_REGISTER_GLOBAL(OpName) \
  TVM_STR_CONCAT(TVM_FUNC_REG_VAR_DEF, __COUNTER__) = ::tvm::runtime::Registry::Register(OpName)

可以看到注册实现在Registry类中，其中有一个Register函数，这个函数会通过全局manager来将算子注册进去。Set_body会将lamda以及普通函数设置给registry中的统一函数封装形式。

Set_body中将通过MakeConv构建一个conv算子，然后注册到registry中。在MakeConv中，首先根据传入的conv参数，包括strides，kernel，layout等，构建atrrs对象，然后根据op的名字从已经注册过的conv算子中得到conv的算子，然后再将attrs和op一起打包到call类中。

Op算子是通过RELAY_REGISTER_OP注册到一个公共AttrRegistry中的。在一个op类中实际上并没有包含这个op的计算过程，只是纳入了这个算子的输入输出以及属性的信息。

TOPI算子
TOPI是TVM自身的一个算子库，这些算子可以通过te来进行表达，类似于numpy的方式。比如对于numpy有np.sum()，同样tvm也可以有te.sum这样的表示。这为通过tvm语言来构造计算图结构提供了方便。我们用官方的一个例子来深入追踪一下topi算子的代码。

n = te.var("n")
m = te.var("m")
A = te.placeholder((n, m), name="A")
k = te.reduce_axis((0, m), "k")
B = te.compute((n,), lambda i: te.sum(A[i, k], axis=k), name="B")
s = te.create_schedule(B.op)
print(tvm.lower(s, [A], simple_mode=True))

输出的函数是这样的：

primfn(A_1: handle) -> ()  attr = {"global_symbol": "main", "tir.noalias": True}  
buffers = {A: Buffer(A_2: Pointer(float32), float32, [n: int32, m: int32], [stride: int32, stride_1: int32], type="auto")}  
buffer_map = {A_1: A} {  attr [B: Pointer(float32)] "storage_scope" = "global";  
allocate(B, float32, [n]);  
for (i: int32, 0, n) {    
B[i] = 0f32    
for (k: int32, 0, m) {      
B[i] = ((float32*)B[i] + (float32*)A_2[((i*stride) + (k*stride_1))])    
}  
}
}

Var就类似于tensorflow中variable，创建了一个变量。其调用链为python/tvm/te/operation.py -> python/tvm/tir/expr.py -> src/tir/ir/http://expr.cc。var继承了PrimExpr类，建立var的时候创建了VarNode。VarNode中保存了变量的类型，名字等信息。

Placeholder也类似tensorflow中的占位符，实际上最终创建了一个PlaceholderOp，保存了名字，shape，dtype信息。

接下来重点看一下compute这个算子。Python调用位于python/tvm/te/operation.py中。这个主要是实现lamba函数算子的转换。

def compute(shape, fcompute, name="compute", tag="", attrs=None):

fcompute是对应着lamba表达式。首先从fcompute的__code__中提取出变量名称和数目信息，然后对应每个输入变量和对应的shape信息一起创建IterVar。IterVar有点像for循环中的循环变量，这里做了IterVar的抽象。然后将IterVar传入fcompute创建了函数体。然后就是根据fcompute的类型建立ComputeOp或者TensorComputeOp。这里我们追踪一下ComputeOp的实现。在src/te/operation/http://compute_op.cc中。也是构建一个ComputeOpNoe。并记录下数据，表达式信息。

dim_var = [tvm.tir.IterVar((0, s), x, 0) for x, s in zip(arg_names, shape[:out_ndim])]
body = fcompute(*[v.var for v in dim_var])
    if isinstance(body, _tensor.TensorIntrinCall):
        for i, s in enumerate(shape[out_ndim:]):
            var_name = "ax" + str(i)
            dim_var.append(tvm.tir.IterVar((0, s), var_name, 4))
        op_node = _ffi_api.TensorComputeOp(name,
                                           tag,
                                           dim_var,
                                           body.reduce_axis,
                                           out_ndim,
                                           body.intrin,
                                           body.tensors,
                                           body.regions,
                                           body.scalar_inputs)
    else:
        if not isinstance(body, (list, tuple)):
            body = [body]
        body = convert(body)
        op_node = _ffi_api.ComputeOp(
            name, tag, attrs, dim_var, body)
 
num = op_node.num_outputs
outputs = tuple(op_node.output(i) for i in range(num))
return outputs[0] if num == 1 else outputs