模型量化 pytorch2onnx

wholetus

已于 2023-06-27 09:44:10 修改

阅读量1.3k

点赞数 1

分类专栏：量化文章标签： python 深度学习

于 2022-08-31 11:18:06 首次发布

本文链接：https://blog.csdn.net/wholetus/article/details/125447128

版权

量化专栏收录该内容

4 篇文章 0 订阅

订阅专栏

文章目录

在pytorch中创建operation

检查pytorch是否包含operation。pytorch可以实现自定义层，可以拓展一些特殊的算子，同时提供了不可导operation的backward写法。例如，虽然pytorch可以自动求导，但是有时候一些操作是不可导的，这时候你需要自定义求导方式。也就是所谓的 “Extending torch.autograd”.

自定义一个pytorch的op，即对pytorch进行扩展。
扩展方法：通过继承 autograd.Function
继承 autograd.Function 的子类只需要实现两个静态方法：
- forward ：计算 op 的前向过程.
  - 在执行 forward 之前，Variable 参数已经被转换成了 Tensor
  - forward 的形参可以有默认参数，默认参数可以是任意 python 对象。
  - 可以返回任意多个 Tensor
  - 里面可以使用任何 python 操作，但是 return 的值必须是 Tensor
- backward ：计算梯度，
  - forward 返回几个值，这里就需要几个形参，还得外加一个 ctx。
  - forward 有几个形参（不包含 ctx），backward 就得返回几个值。
  - backward 实参也是 Variable 。
  - backward 返回的得是 Variable。

根据步骤定义了自己的LinearFunction

import torch
from torch.autograd import gradcheck
from torch.autograd import Variable
from torch.autograd import Function

'''
    symbolic可以认为规定了,pytorch->onnx这个过程中的输出规范。
    简单的来说我们就是在自己创造,onnx非标准化的非ATen操作符(op),我的代码中对应的symbolic是这样的
'''
class LinearFunction(Function):

	# 这里的beta和alpha没有实际用处，只是证明使用自定义的op，在torch->onnx过程中，是可以传递网络参数的。
	
    @staticmethod
    def symbolic(g, self, mat1, mat2, beta, alpha):
        #return g.op("nonentity", mat1, mat2, self, beta_f=beta, alpha_f=alpha)
        return g.op("nonentity", self,mat1, mat2,  beta_f=beta, alpha_f=alpha)

    # forward 和 backward 都得是 静态方法！！！！！
    @staticmethod
    # bias 是个可选参数，有个 默认值 None
    def forward(ctx, input, weight, bias=None):
        # input，weight 都已经变成了 Tensor
        # 用 ctx 把该存的存起来，留着 backward 的时候用
        # ctx.save_for_backward 只能存 tensor, None, 其余都不能存。
        # ctx.save_for_backward 只保存 forward 的实参，或者 forward 的返回值。
        ctx.save_for_backward(input, weight, bias)
        output = input.mm(weight.t())
        if bias is not None:
            output += bias.unsqueeze(0).expand_as(output)
        return output

    # 由于 forward 只有一个 返回值，所以 backward 只需要一个参数 接收 梯度。
    @staticmethod
    def backward(ctx, grad_output):
        # 此方法猜测是 torch.no_grad() 上下文中运行的. 
        #grad_output 是 Variable 类型。
        # 在开头的地方将保存的 tensor 给 unpack 了
        # 然后 给 所有应该返回的 梯度 以 None 初始化。
        input, weight, bias = ctx.saved_tensors
        grad_input = grad_weight = grad_bias = None

        # needs_input_grad 检查是可选的。如果想使得 代码更简单的话，可以忽略。
        # 给不需要梯度的 参数返回梯度 不是一个错误。
		# 返回值 的个数 需要和 forward 形参的个数（不包含 ctx）一致
        if ctx.needs_input_grad[0]:
            grad_input = grad_output.mm(weight)
        if ctx.needs_input_grad[1]:
            grad_weight = grad_output.t().mm(input)
        if bias is not None and ctx.needs_input_grad[2]:
            grad_bias = grad_output.sum(0).squeeze(0)
		# 梯度的顺序和 forward 形参的顺序要对应。
        return grad_input, grad_weight, grad_bias

上面就是继承 Function 的全过程，pytorch封装有 Function 和 Module, linear 可以当成函数直接调用,像 F.conv2d 一样, 也可以封装进 Module 像 nn.Conv2d 那样使用.

直接使用LinearFunction

# input, weight, 是 Variable
def linear(input, weight, bias=None):
    # 一定是要 通过调用 apply 来用的。 Function.apply 中估计做了不少事情。
    return LinearFunction.apply(input, weight, bias)

if __name__ == '__main__':
    in_ = torch.randn((20, 20), requires_grad=True, dtype=torch.double)
    weight_ = torch.randn((20, 20), requires_grad=True, dtype=torch.double)

    res= linear(in_, weight_)
    
    loss = res.sum()
    loss.backward()                      # 转成标量
    # 反向传播：因为 loss = sum(y),故grad_outputs = dloss/dy = 1,可以省略不写

    # print(in_.grad)
    # print(weight_.grad)

    input = (torch.randn((20, 20), requires_grad=True, dtype=torch.double) ,
                torch.randn((30, 20), requires_grad=True, dtype=torch.double))
    test = gradcheck(LinearFunction.apply, input, eps=1e-6, atol=1e-4)
    # 如果通过，最后会打印一个 True
    print(test)

扩展LinearFunction到module

扩展module就很简单，需要重载 nn.Module中的__init__和__forward__

class Linear(nn.Module):
    def __init__(self, input_features, output_features, bias=True):
        super(Linear, self).__init__()
        self.input_features = input_features
        self.output_features = output_features

        # nn.Parameter is a special kind of Variable, that will get
        # automatically registered as Module's parameter once it's assigned
        # 这个很重要！ Parameters是默认需要梯度的！
        # as an attribute. Parameters and buffers need to be registered, or
        # they won't appear in .parameters() (doesn't apply to buffers), and
        # won't be converted when e.g. .cuda() is called. You can use
        # .register_buffer() to register buffers.
        # nn.Parameters can never be volatile and, different than Variables,
        # they require gradients by default.
        self.weight = nn.Parameter(torch.Tensor(output_features, input_features))
        if bias:
            self.bias = nn.Parameter(torch.Tensor(output_features))
        else:
            # You should always register all possible parameters, but the
            # optional ones can be None if you want.
            self.register_parameter('bias', None)

        # Not a very smart way to initialize weights
        self.weight.data.uniform_(-0.1, 0.1)
        if bias is not None:
            self.bias.data.uniform_(-0.1, 0.1)

    def forward(self, input):
        # See the autograd section for explanation of what happens here.
        return LinearFunction.apply(input, self.weight, self.bias)

让torch.onnx能够识别自定义op

自定义的op在转onnx的时候报错

在尝试利用自定义的op执行torch.nn.export想要输出protobuf二值文件的时候，读到自定义op，会报错：

...
  %19 : Float(64, 64, 3, 3) = onnx::MaxPool[kernel_shape=[2, 2], pads=[0, 0, 0, 0], strides=[2, 2]](%18), scope: Net/Sequential[conv3]/MaxPool2d[2]
  %20 : Float(64, 576) = onnx::Flatten[axis=1](%19), scope: Net
  %input.5 : Float(64, 128) = ^LinearFunction()(%20, %dense.0.weight, %dense.0.bias), scope: Net/Sequential[dense]/Linear[0]
  %22 : Float(64, 128) = onnx::Relu(%input.5), scope: Net/Sequential[dense]/ReLU[1]
  %23 : Float(64, 10) = ^LinearFunction()(%22, %dense.2.weight, %dense.2.bias), scope: Net/Sequential[dense]/Linear[2]

显示未定义的操作operator LinearFunction
解决办法就是想办法让torch.onnx能读懂我自定义的op：LinearFunction。

被op运算符已经在ONNX标准化

现今onnx支持的运算符，一般最新版本的支持的运算符信息会在github的onnx源码工程中的Operators.md中写出Operators.md.
如果，运算符已经被标准化，即在上边的列表中能找到，且在该版本的torch中，这个操作是一个ATen操作符，即在 torch/csrc/autograd/generated/VariableType.h能找到它的定义。
那就在torch/onnx/symbolic.py里面加上符号并且遵循下面的指令：

在 torch/onnx/symbolic.py里面定义符号。确保该功能与在ATen操作符在VariableType.h的功能相同。
第一个参数总是ONNX图形参数，参数的名字必须与 VariableType.h里的匹配，因为调度是依赖于关键字参数完成的。
参数排序不需要严格与VariableType.h匹配，首先的张量一定是输入的张量，然后是非张量参数。
在符号功能里，如果操作符已经在ONNX标准化了，我们只需要创建一个代码去表示在图形里面的ONNX操作符。
如果输入参数是一个张量，但是ONNX需要的是一个标量形式的输入，我们需要做个转化。_scalar可以帮助我们将一个张量转化为一个python标量，并且_if_scalar_type_as函数可以将python标量转化为PyTorch张量。

op运算符没有被标准化

如果没有被标准化，也就代表torch.onnx模块下，也没有这个op的定义，是个非ATen操作符，那么符号功能需要加在相应的PyTorch函数类中。请阅读下面的指示：

在相应的函数类中创建一个符号函数命名为symbolic。
第一个参数总是导出ONNX图形参数。
参数的名字除了第一个必须与前面的形式严格匹配。
输出元组大小必须与前面的形式严格匹配。
在符号功能中，如果操作符已经在ONNX标准化了，我们只需要创建一个代码去表示在图形里面的ONNX操作符。

解决自定义的LinearFunction操作

在Pytorch1.1.0 入门自定义op（python）中提到过，早LinearFunction的定义中定义了一个@staticmethod的函数symbolic()，这个被叫做符号函数，经过后来的尝试，就是用torch.onnx.export进行向onnx格式转换的过程中，帮助识别自定义操作的函数。

最开始苦于不知道具体使用方法，观察了一下torch/onnx/symbolic.py下的操作，很多都是以g.op()作为返回对象的，而这个函数的第一个参数都能最后输出的onnx格式的模型的名字一样，例如：

def stack(g, tensor_list, dim):
    unsqueezed = [g.op("Unsqueeze", t, axes_i=[dim]) for t in _unpack_list(tensor_list)]
    return g.op("Concat", *unsqueezed, axis_i=dim)


def mm(g, self, other):
    # Create a dummy C tensor. Only needed for API purposes, the value is
    # since beta = 0
    ty = _try_get_scalar_type(self, other).lower()
    C = g.constant(0, [1], ty)
    return g.op("Gemm", self, other, C, beta_f=0.0, alpha_f=1.0)

最后，对应的onnx层名字就是"Concat"和“Gemm”等。

标准torch.nn.Linear()方法输出的onnx的格式之后，发现全连接层的表示是“Gemm”：
去torch/onnx/symbolic.py扒了扒已经被定义的op的写法, addmm只会返回一个，所以torch.nn.Linear()调用的应该是addmm。

def addmm(g, self, mat1, mat2, beta, alpha):
    return g.op("Gemm", mat1, mat2, self, beta_f=_scalar(beta), alpha_f=_scalar(alpha))

所以，在symbolic()函数下照猫画虎定义了和addmm几乎一样的结构。

    def symbolic(g, self, mat1, mat2, beta, alpha):
        return g.op("nonentity", self,mat1, mat2,  beta_f=beta, alpha_f=alpha)

pytorch转onnx代码实现

 torch.onnx.export(model,
                      (data,indices,updates),
                      "vfe.onnx",
                    #   export_params=True,
                      opset_version=13,
                    #   do_constant_folding=True,
                    #   keep_initializers_as_inputs=True,
                      input_names=["data","indices","updates"],
                      output_names=["output"],
                      operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK)

torch.onnx.export详细介绍

operator_export_type (enum, default None)

默认为OperatorExportTypes.ONNX，如果Pytorch built with DPYTORCH_ONNX_CAFFE2_BUNDLE，则默认为OperatorExportTypes.ONNX_ATEN_FALLBACK。

枚举类型包括：

OperatorExportTypes.ONNX - 将所有操作导出为ONNX操作。

OperatorExportTypes.ONNX_FALLTHROUGH - 试图将所有操作导出为ONNX操作，但碰到无法转换的操作（如onnx未实现的操作），则将操作导出为“自定义操作”，为了使导出的模型可用，运行时必须支持这些自定义操作。

OperatorExportTypes.ONNX_ATEN - 所有ATen操作导出为ATen操作，ATen是Pytorch的内建tensor库，所以这将使得模型直接使用Pytorch实现。（此方法转换的模型只能被Caffe2直接使用）

OperatorExportTypes.ONNX_ATEN_FALLBACK - 试图将所有的ATen操作也转换为ONNX操作，如果无法转换则转换为ATen操作（此方法转换的模型只能被Caffe2直接使用）。

对应的输出的onnx结构的部分也就是如下的

...
 %19 : Float(64, 64, 3, 3) = onnx::MaxPool[kernel_shape=[2, 2], pads=[0, 0, 0, 0], strides=[2, 2]](%18), scope: Net_LinearFunction/Sequential[conv3]/MaxPool2d[2]
 %20 : Float(64, 576) = onnx::Flatten[axis=1](%19), scope: Net_LinearFunction
 %21 : Float(64, 128) = onnx::nonentity[alpha=1.3, beta=1.2](%20, %dense.0.weight, %dense.0.bias), scope: Net_LinearFunction/Sequential[dense]/Linear[0]
 %22 : Float(64, 128) = onnx::Relu(%21), scope: Net_LinearFunction/Sequential[dense]/ReLU[1]
 %23 : Float(64, 10) = onnx::nonentity[alpha=1.33, beta=1.22](%22, %dense.2.weight, %dense.2.bias), scope: Net_LinearFunction/Sequential[dense]/Linear[2]
 return (%23)