模型部署一

Cindy_1224

已于 2023-04-18 13:51:13 修改

阅读量1.2k

点赞数 1

分类专栏：模型部署文章标签： pytorch 深度学习

于 2023-04-18 11:30:51 首次发布

本文链接：https://blog.csdn.net/cindywry/article/details/130217457

版权

模型部署专栏收录该内容

7 篇文章 2 订阅

订阅专栏

模型部署解决的问题：

1、模型部署，指把训练好的模型在特定环境中运行的过程。模型部署要解决模型框架兼容性差和模型运行速度慢这两大问题。

2、模型部署的常见流水线是“深度学习框架-中间表示-推理引擎”。其中比较常用的一个中间表示是 ONNX。

3、深度学习模型实际上就是一个计算图。模型部署时通常把模型转换成静态的计算图，即没有控制流（分支语句、循环语句）的计算图。

4、模型转换，训练框架都有对应的接口将模型转换成onnx 中间表示模型，比如PyTorch 框架就自带对 ONNX 的支持，只需要构造一组随机的输入，并对模型调用 torch.onnx.export 即可完成 PyTorch 到 ONNX 的转换。

5、将模型在各平台部署起来就需要推理引擎的支持，推理引擎 ONNX Runtime 对 ONNX 模型有原生的支持。给定一个 .onnx 文件，只需要简单使用 ONNX Runtime 的 Python API 就可以完成模型推理。然而在不同的平台上有它自家的推理框架，比如NVIDA系列的平台有tensorRT的推理引擎来支持，IT的芯片有TIDL 推理框架去支持，因此需要看实际的模型应用场景选择不同的推理引擎。

模型部署的难点：

模型部署中常见的几类困难有：模型的动态化；新算子的实现；中间表示与推理引擎的兼容问题。

1、模型的动态化。出于性能的考虑，各推理框架都默认模型的输入形状、输出形状、结构是静态的。而为了让模型的泛用性更强，部署时需要在尽可能不影响原有逻辑的前提下，让模型的输入输出或是结构动态化。

2、新算子的实现。深度学习技术日新月异，提出新算子的速度往往快于 ONNX 维护者支持的速度。为了部署最新的模型，部署工程师往往需要自己在 ONNX 和推理引擎中支持新算子。

3、中间表示与推理引擎的兼容问题。由于各推理引擎的实现不同，对 ONNX 难以形成统一的支持。为了确保模型在不同的推理引擎中有同样的运行效果，部署工程师往往得为某个推理引擎定制模型代码，这为模型部署引入了许多工作量。

模型转换原理

2、拿PyTorch 框架来说，转 ONNX模型实际上就是把pytorch 中的操作算子映射成 ONNX 定义的某一个算子。比如对于 PyTorch 中的 Upsample 和 interpolate，在转 ONNX 后最终都会成为 ONNX 的 Resize 算子。在这种算子转换的过程中会存在算子映射不合理的时候，因此有时候是需要有自定义算子的情况的。

3、通过修改继承自 torch.autograd.Function 的算子的 symbolic 方法，可以改变该算子映射到 ONNX 算子的行为。

PyTorch 算子顺利转换到 ONNX ，我们需要保证以下三个环节都不出错：

1、算子在 PyTorch 中有实现

2、有把该 PyTorch 算子映射成一个或多个 ONNX 算子的方法，需要有符号函数（算子补充描述映射规则的函数）

3、ONNX 有相应的算子

自定义算子

符号函数：可以看成是 PyTorch 算子类的一个静态方法。在把 PyTorch 模型转换成 ONNX 模型时，各个 PyTorch 算子的符号函数会被依次调用，以完成 PyTorch 算子到 ONNX 算子的转换。

科普pyi文件：

它是仅包含类型信息的文件，没有运行时代码。. pyi文件是一个Python skeleton，具有适当的结构、调用签名和返回值，以匹配模块中定义的函数、属性、类和方法。理解为PyTorch 调用接口

添加符号函数步骤：

补充符号函数需要在pyi文件中查找算子对应的接口定义，查pyi文件

如下代码是一个自定义算子的例子；

import os

import cv2
import numpy as np
import requests
import torch
import torch.onnx
from torch import nn
import onnx
import onnxruntime
from torch.nn.functional import interpolate


# 符号函数的理解：
# def symbolic(g: torch._C.Graph,    C++实现里的类
#               input_0: torch._C.Value, C++实现里的类
#               input_1: torch._C.Value, C++实现里的类
#               ...):
# 支持动态放大倍数的Resize 算子
# 创建一个实现插值的 PyTorch 算子，然后让它映射到一个我们期望的 ONNX Resize 算子上。
class NewInterpolate_op(torch.autograd.Function):
    # 为了对接 ONNX 中 Resize 算子的 scales 参数，这个放缩比例是一个 [1, 1, x, x] 的张量，其中 x 为放大倍数。在之前放大3倍的模型中，
    # 这个参数被固定成了[1, 1, 3, 3]。因此，在插值算子中，我们希望模型的第二个输入是一个 [1, 1, w, h] 的张量，其中 w 和 h 分别是图片宽和高的放大倍数。
    # 决定新算子映射到 ONNX 算子的方法,映射到 ONNX 的方法由一个算子的 symbolic 方法决定。
    # 希望 scales 参数是由输入动态决定的。因此，在填入 ONNX 的 scales 时，要把 symbolic 方法的输入参数中的 scales 填入。
    @staticmethod
    def symbolic(g, input, scales):
        return g.op("Resize",  # 算子type 如Conv
                    input,
                    g.op("Constant",
                         value_t=torch.tensor([], dtype=torch.float32)),
                    scales,
                    coordinate_transformation_mode_s="pytorch_half_pixel",
                    cubic_coeff_a_f=-0.75,
                    mode_s='cubic',
                    nearest_mode_s="floor")

    # 需要把 [1, 1, w, h] 格式的输入对接到原来的 interpolate 函数上。
    @staticmethod
    def forward(ctx, input, scales):#截取输入张量的后两个元素，把这两个元素以 list 的格式传入 interpolate 的 scale_factor 参数。
        scales = scales.tolist()[-2:]
        return interpolate(input,
                           scale_factor=scales,
                           mode='bicubic',
                           align_corners=False)

class SuperResolutionNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=9, padding=4)
        self.conv2 = nn.Conv2d(64, 32, kernel_size=1, padding=0)
        self.conv3 = nn.Conv2d(32, 3, kernel_size=5, padding=2)
        self.relu = nn.ReLU()

    def forward(self, x, upscale_factor):
        # 插入自定义的算子 支持动态分辨率
        x = NewInterpolate_op.apply(x, upscale_factor)
        out = self.relu(self.conv1(x))
        out = self.relu(self.conv2(out))
        out = self.conv3(out)
        return out

# 解析srcnn.pth模型权重到torch_model，免去训练
def init_torch_model():
    # torch_model = SuperResolutionNet(upscale_factor=3)
    torch_model = SuperResolutionNet()
    state_dict = torch.load('srcnn.pth')['state_dict']
    # Adapt the checkpoint
    for old_key in list(state_dict.keys()):
        new_key = '.'.join(old_key.split('.')[1:])
        state_dict[new_key] = state_dict.pop(old_key)
    torch_model.load_state_dict(state_dict)
    torch_model.eval()
    return torch_model

#直接跑forward的方式跑模型
def test_srcnn():
    model = init_torch_model()
    factor = torch.tensor([1, 1, 3, 3], dtype=torch.float)
    input_img = cv2.imread('face.png').astype(np.float32)
    input_img = cv2.resize(input_img, (256, 256))
    # HWC to NCHW
    input_img = np.transpose(input_img, [2, 0, 1])
    input_img = np.expand_dims(input_img, 0)
    # Inference
    # 动态设置缩放因子factor
    torch_output = model(torch.from_numpy(input_img), factor).detach().numpy()
    # NCHW to HWC
    torch_output = np.squeeze(torch_output, 0)
    torch_output = np.clip(torch_output, 0, 255)
    torch_output = np.transpose(torch_output, [1, 2, 0]).astype(np.uint8)
    # Show image
    # cv2.imwrite("face_torch.png", torch_output)
    cv2.imwrite("face_torch_2.png", torch_output)

# 把 PyTorch 的模型转换成 ONNX 格式的模型
# 但为什么需要为模型提供一组输入呢？这就涉及到 ONNX 转换的原理了。从 PyTorch 的模型到 ONNX 的模型，本质上是一种语言上的翻译。
# 直觉上的想法是像编译器一样彻底解析原模型的代码，记录所有控制流。
# 因此，PyTorch 提供了一种叫做追踪（trace）的模型转换方法：给定一组输入，
# 再实际执行一遍模型，即把这组输入对应的计算图记录下来，保存为 ONNX 格式。
def model_onnx_export():
    x = torch.randn(1, 3, 256, 256)
    factor = torch.tensor([1, 1, 3, 3], dtype=torch.float)
    model = init_torch_model()
    with torch.no_grad():
        torch.onnx.export(  #torch.onnx.export 是 PyTorch 自带的把模型转换成 ONNX 格式的函数
            model, #输入模型
            (x, factor), #输入参数
            "srcnn_op_.onnx",#导出的onnx模型文件名字
            opset_version=11,#转换时参考哪个 ONNX 算子集版本
            input_names=['input', 'factor'],# 输入输出tensor的名称
            output_names=['output'])

def onnx_model_ckeck():
    onnx_model = onnx.load("srcnn_op.onnx")
    try:
        onnx.checker.check_model(onnx_model)
    except Exception:
        print("Model incorrect")
    else:
        print("Model correct")

def ort_infer():
    ort_session = onnxruntime.InferenceSession("srcnn_op.onnx")
    input_img = cv2.imread('face.png').astype(np.float32)
    input_img = cv2.resize(input_img, (256, 256))
    # HWC to NCHW
    input_img = np.transpose(input_img, [2, 0, 1])
    input_img = np.expand_dims(input_img, 0)
    #to array
    input_img = np.array(input_img, dtype=np.float32)

    # ort_inputs = {'input': input_img}
    input_factor = np.array([1, 1, 4, 4], dtype=np.float32)
    ort_inputs = {'input': input_img, 'factor': input_factor}

    # ort_output = ort_session.run(['output'], ort_inputs)[0]
    ort_output = ort_session.run(None, ort_inputs)[0]
    ort_output = np.squeeze(ort_output, 0)
    ort_output = np.clip(ort_output, 0, 255)
    ort_output = np.transpose(ort_output, [1, 2, 0]).astype(np.uint8)
    cv2.imwrite("face_ort_op.png", ort_output)

if __name__ == '__main__':
    # test_srcnn()
    model_onnx_export()
    # onnx_model_ckeck()
    # ort_infer()