深度学习模型部署学习二

spade_eddie

已于 2022-07-19 11:49:45 修改

阅读量957

点赞数 3

分类专栏：模型部署文章标签：深度学习学习人工智能

于 2022-07-15 14:42:34 首次发布

原文链接：https://zhuanlan.zhihu.com/p/479290520

版权

模型部署专栏收录该内容

6 篇文章 14 订阅

订阅专栏

解决模型部署中的难题

学习链接：模型部署入门教程（二）：解决模型部署中的难题
写在前面： 本文档为学习上述链接的相关记录，基本内容一致，仅用于学习用途，若侵权请联系我删除

文章目录

解决模型部署中的难题

1 模型部署中常见的难题

学习记录一中存在的问题：图片的放大倍数固定是 4，我们无法让图片放大任意的倍数。

模型部署时一般会碰到以下几类困难：

模型的动态化。出于性能的考虑，各推理框架都默认模型的输入形状、输出形状、结构是静态的。而为了让模型的泛用性更强，部署时需要在尽可能不影响原有逻辑的前提下，让模型的输入输出或是结构动态化。
新算子的实现。深度学习技术日新月异，提出新算子的速度往往快于 ONNX 维护者支持的速度。为了部署最新的模型，部署工程师往往需要自己在 ONNX 和推理引擎中支持新算子。
中间表示与推理引擎的兼容问题。由于各推理引擎的实现不同，对 ONNX 难以形成统一的支持。为了确保模型在不同的推理引擎中有同样的运行效果，部署工程师往往得为某个推理引擎定制模型代码，这为模型部署引入了许多工作量。

2 实现动态放大的超分辨率模型

在原来的 SRCNN 中，图片的放大比例是写死在模型里的。通过使用 upscale_factor 来控制模型的放大比例。初始化模型的时候，默认令 upscale_factor 为 3，生成了一个放大 3 倍的 PyTorch 模型。这个 PyTorch 模型最终被转换成了 ONNX 格式的模型。如果我们需要一个放大 4 倍的模型，需要重新生成一遍模型，再做一次到 ONNX 的转换。

解决办法：

修改原来的模型，令模型的放大倍数变成推理时的输入。

import os
import cv2
import numpy as np
import requests
import torch
import torch.onnx
from torch import nn
from torch.nn.functional import interpolate


# 定义的超分辨率网络结构
class SuperResolutionNet(nn.Module):
    def __init__(self):
        super().__init__()

        self.conv1 = nn.Conv2d(3, 64, kernel_size=9, padding=4)
        self.conv2 = nn.Conv2d(64, 32, kernel_size=1, padding=0)
        self.conv3 = nn.Conv2d(32, 3, kernel_size=5, padding=2)

        self.relu = nn.ReLU()

    def forward(self, x, upscale_factor):
        # 原来的上采样模块定义在__init__中，现在作为参数定义在前向传播中
        x = interpolate(x,
                        scale_factor=upscale_factor,
                        mode='bicubic',
                        align_corners=False)
        out = self.relu(self.conv1(x))
        out = self.relu(self.conv2(out))
        out = self.conv3(out)
        return out

    # 初始化模型参数


def init_torch_model():
    torch_model = SuperResolutionNet()
    state_dict = torch.load('srcnn.pth')['state_dict']

    # 加载模型参数
    for old_key in list(state_dict.keys()):
        new_key = '.'.join(old_key.split('.')[1:])
        state_dict[new_key] = state_dict.pop(old_key)

    torch_model.load_state_dict(state_dict)
    torch_model.eval()
    return torch_model


if __name__ == '__main__':

    # 下载权重文件.pth 和 测试图片
    urls = [
        'https://download.openmmlab.com/mmediting/restorers/srcnn/srcnn_x4k915_1x16_1000k_div2k_20200608-4186f232.pth',
        'https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fimg.mp.itc.cn%2Fupload%2F20170604%2F10c6fad2080f438bae1ce98b0d07eed2_th.jpg&refer=http%3A%2F%2Fimg.mp.itc.cn&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=auto?sec=1660440355&t=ba210d286709983734a43f3451c4629e']
    names = ['srcnn.pth', 'face.png']
    for url, name in zip(urls, names):
        if not os.path.exists(name):
            open(name, 'wb').write(requests.get(url).content)

    model = init_torch_model()
    input_img = cv2.imread('face.png').astype(np.float32)

    # 输入图片格式转换：HWC to NCHW
    input_img = np.transpose(input_img, [2, 0, 1])
    input_img = np.expand_dims(input_img, 0)

    # 推理， 这里多了一个放大系数的参数 3
    torch_output = model(torch.from_numpy(input_img), 3).detach().numpy()

    # 推理结果转换到图片格式：NCHW to HWC
    torch_output = np.squeeze(torch_output, 0)
    torch_output = np.clip(torch_output, 0, 255)
    torch_output = np.transpose(torch_output, [1, 2, 0]).astype(np.uint8)

    # 保存推理图片
    cv2.imwrite("face_torch_2.png", torch_output)

SuperResolutionNet 未修改之前，nn.Upsample 在初始化阶段固化了放大倍数，而 PyTorch 的 interpolate 插值算子可以在运行阶段选择放大倍数。
因此，我们在新脚本中使用 interpolate 代替 nn.Upsample，从而让模型支持动态放大倍数的超分。在使用模型推理时，我们把放大倍数设置为 3。最后，图片保存在文件 “face_torch_2.png” 中。一切正常的话，“face_torch_2.png” 和 “face_torch.png” 的内容一模一样。

通过简单的修改，PyTorch 模型已经支持了动态分辨率。

导出中间表达onnx

# 这里的尺寸需要和模型的输入一致
# 紧接上文中的代码
x = torch.randn(1, 3, 718, 640)
 
with torch.no_grad(): 
    torch.onnx.export( 
        model, 
        (x, 3), 
        "srcnn2.onnx", 
        opset_version=11, 
        input_names=['input', 'factor'], 
        output_names=['output'])

运行这些脚本时，会报一长串错误。因为碰到了模型部署中的兼容性问题。

TypeError: upsample_bicubic2d() received an invalid combination of arguments - got (Tensor, NoneType, bool, list), but expected one of:
 * (Tensor input, tuple of ints output_size, bool align_corners, tuple of floats scale_factors)
      didn't match because some of the arguments have invalid types: (Tensor, !NoneType!, bool, !list!)
 * (Tensor input, tuple of ints output_size, bool align_corners, float scales_h, float scales_w, *, Tensor out)

3. 自定义算子

直接使用 PyTorch 模型的话，我们修改几行代码就能实现模型输入的动态化。但在模型部署中，我们要花数倍的时间来设法解决这一问题。

刚刚的报错是因为 PyTorch 模型在导出到 ONNX 模型时，模型的输入参数的类型必须全部是 torch.Tensor。而实际上我们传入的第二个参数 " 3 "是一个整形变量。这不符合 PyTorch 转 ONNX 的规定。我们必须要修改一下原来的模型的输入。为了保证输入的所有参数都是 torch.Tensor 类型的，我们做如下修改：


class SuperResolutionNet(nn.Module): 
 
    def forward(self, x, upscale_factor): 
        x = interpolate(x, 
        				# 修改1，取upscale_factor的值，而不是tensor
                        scale_factor=upscale_factor.item(), 
                        mode='bicubic', 
                        align_corners=False) 
 
... 
 
# 修改2：推理时的传参 
# 注意，第二个输入参数为 torch.tensor(3) 
torch_output = model(torch.from_numpy(input_img), torch.tensor(3)).detach().numpy() 
 
... 
 
with torch.no_grad(): 
	# 修改3：导出时的传参 
	# 注意，第二个输入参数为 (x, torch.tensor(3) )
    torch.onnx.export(model, (x, torch.tensor(3)), 
                      "srcnn2.onnx", 
                      opset_version=11, 
                      input_names=['input', 'factor'], 
                      output_names=['output'])

由于 PyTorch 中 interpolate 的 scale_factor 参数必须是一个数值，我们使用 torch.Tensor.item() 来把只有一个元素的 torch.Tensor 转换成数值。之后，在模型推理时，我们使用 torch.tensor(3) 代替 3，以使得我们的所有输入都满足要求。现在运行脚本的话，无论是直接运行模型，还是导出 ONNX 模型，都不会报错了。

但是，导出 ONNX 时却报了一条 TraceWarning 的警告。这条警告说有一些量可能会追踪失败。

TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  scale_factor=upscale_factor.item(),

这是怎么回事呢？让我们把生成的 srcnn2.onnx 用 Netron 可视化一下：
在这里插入图片描述可以发现，虽然我们把模型推理的输入设置为了两个，但 ONNX 模型还是长得和原来一模一样，只有一个叫 " input " 的输入。

这是由于我们使用了 torch.Tensor.item() 把数据从 Tensor 里取出来，而导出 ONNX 模型时这个操作是无法被记录的，只好报了一条 TraceWarning。这导致 interpolate 插值函数的放大倍数还是被设置成了" 3 " 这个固定值，我们导出的" srcnn2.onnx " 和最开始的 " srcnn.onnx "完全相同。

直接修改原来的模型似乎行不通，我们得从 PyTorch 转 ONNX 的原理入手，强行令 ONNX 模型明白我们的想法了。

仔细观察 Netron 上可视化出的 ONNX 模型，可以发现在 PyTorch 中无论是使用最早的 nn.Upsample，还是后来的 interpolate，PyTorch 里的插值操作最后都会转换成 ONNX 定义的 Resize 操作。

也就是说，所谓 PyTorch 转 ONNX，实际上就是把每个 PyTorch 的操作映射成了 ONNX 定义的算子。

点击该算子，可以看到它的详细参数如下：
在这里插入图片描述
其中，展开 scales，可以看到:

scales 是一个长度为 4 的一维张量;
其内容为 [1, 1, 3, 3], 表示 Resize 操作每一个维度的缩放系数；
其类型为 Initializer，表示这个值是根据常量直接初始化出来的。

如果我们能够自己生成一个 ONNX 的 Resize 算子，让 scales 成为一个可变量而不是常量，就像它上面的 X 一样，那这个超分辨率模型就能动态缩放了。

现有实现插值的 PyTorch 算子有一套规定好的映射到 ONNX Resize 算子的方法，这些映射出的 Resize 算子的 scales 只能是常量，无法满足我们的需求。实现方式：

自己定义一个实现插值的 PyTorch 算子
让它映射到一个我们期望的 ONNX Resize 算子上

下面的脚本定义了一个 PyTorch 插值算子，并在模型里使用了它。我们先通过运行模型来验证该算子的正确性：

import os
import cv2
import numpy as np
import requests
import torch
import torch.onnx
from torch import nn
from torch.nn.functional import interpolate


# 实现插值的算子
class NewInterpolate(torch.autograd.Function):

    @staticmethod
    def symbolic(g, input, scales):
        return g.op("Resize",
                    input,
                    g.op("Constant",
                         value_t=torch.tensor([], dtype=torch.float32)),
                    scales,
                    coordinate_transformation_mode_s="pytorch_half_pixel",
                    cubic_coeff_a_f=-0.75,
                    mode_s='cubic',
                    nearest_mode_s="floor")

    @staticmethod
    def forward(ctx, input, scales):
        scales = scales.tolist()[-2:]
        return interpolate(input,
                           scale_factor=scales,
                           mode='bicubic',
                           align_corners=False)


# 定义的超分辨率网络结构
class StrangeSuperResolutionNet(nn.Module):

    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=9, padding=4)
        self.conv2 = nn.Conv2d(64, 32, kernel_size=1, padding=0)
        self.conv3 = nn.Conv2d(32, 3, kernel_size=5, padding=2)

        self.relu = nn.ReLU()

    def forward(self, x, upscale_factor):
        # 挂载算子
        x = NewInterpolate.apply(x, upscale_factor)
        out = self.relu(self.conv1(x))
        out = self.relu(self.conv2(out))
        out = self.conv3(out)
        return out


def init_torch_model():
    torch_model = StrangeSuperResolutionNet()
    state_dict = torch.load('srcnn.pth')['state_dict']

    # 加载权重参数
    for old_key in list(state_dict.keys()):
        new_key = '.'.join(old_key.split('.')[1:])
        state_dict[new_key] = state_dict.pop(old_key)

    torch_model.load_state_dict(state_dict)
    torch_model.eval()
    return torch_model


if __name__ == '__main__':
    # 下载权重文件.pth 和 测试图片
    urls = [
        'https://download.openmmlab.com/mmediting/restorers/srcnn/srcnn_x4k915_1x16_1000k_div2k_20200608-4186f232.pth',
        'https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fimg.mp.itc.cn%2Fupload%2F20170604%2F10c6fad2080f438bae1ce98b0d07eed2_th.jpg&refer=http%3A%2F%2Fimg.mp.itc.cn&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=auto?sec=1660440355&t=ba210d286709983734a43f3451c4629e']
    names = ['srcnn.pth', 'face.png']
    for url, name in zip(urls, names):
        if not os.path.exists(name):
            open(name, 'wb').write(requests.get(url).content)

    model = init_torch_model()
    input_img = cv2.imread('face.png').astype(np.float32)

    # 输入图片格式转换：HWC to NCHW
    input_img = np.transpose(input_img, [2, 0, 1])
    input_img = np.expand_dims(input_img, 0)

    # 推理
    # 缩放因子
    factor = torch.tensor([1, 1, 3, 3], dtype=torch.float)
    torch_output = model(torch.from_numpy(input_img), factor).detach().numpy()

    # 推理结果转换到图片格式：NCHW to HWC
    torch_output = np.squeeze(torch_output, 0)
    torch_output = np.clip(torch_output, 0, 255)
    torch_output = np.transpose(torch_output, [1, 2, 0]).astype(np.uint8)

    # 保存推理图片
    cv2.imwrite("face_torch_3.png", torch_output)

模型运行正常的话，一幅放大3倍的超分辨率图片会保存在"face_torch_3.png"中，其内容和"face_torch.png"完全相同。

在刚刚那个脚本中，我们定义 PyTorch 插值算子的代码如下：

class NewInterpolate(torch.autograd.Function): 
 
    @staticmethod 
    def symbolic(g, input, scales): 
        return g.op("Resize", 
                    input, 
                    g.op("Constant", 
                         value_t=torch.tensor([], dtype=torch.float32)), 
                    scales, 
                    coordinate_transformation_mode_s="pytorch_half_pixel", 
                    cubic_coeff_a_f=-0.75, 
                    mode_s='cubic', 
                    nearest_mode_s="floor") 
 
    @staticmethod 
    def forward(ctx, input, scales): 
        scales = scales.tolist()[-2:] 
        return interpolate(input, 
                           scale_factor=scales, 
                           mode='bicubic', 
                           align_corners=False)

思路： 我们希望新的插值算子有两个输入，一个是被用于操作的图像，一个是图像的放缩比例。前面讲到，为了对接 ONNX 中 Resize 算子的 scales 参数，这个放缩比例是一个 [1, 1, x, x] 的张量，其中 x 为放大倍数。在之前放大3倍的模型中，这个参数被固定成了[1, 1, 3, 3]。
因此，在插值算子中，我们希望模型的第二个输入是一个 [1, 1, w, h] 的张量，其中 w 和 h 分别是图片宽和高的放大倍数。
算子的具体实现： 算子的推理行为由算子的 foward 方法决定。该方法的第一个参数必须为 ctx，后面的参数为算子的自定义输入，我们设置两个输入，分别为被操作的图像和放缩比例。为保证推理正确，需要把 [1, 1, w, h] 格式的输入对接到原来的 interpolate 函数上。我们的做法是截取输入张量的后两个元素，把这两个元素以 list 的格式传入 interpolate 的 scale_factor 参数。

接下来，我们要决定新算子映射到 ONNX 算子的方法。映射到 ONNX 的方法由一个算子的symbolic 方法决定。symbolic 方法第一个参数必须是g，之后的参数是算子的自定义输入和 forward 函数一样。ONNX 算子的具体定义由 g.op 实现。g.op 的每个参数都可以映射到 ONNX 中的算子属性：

在这里插入图片描述
对于其他参数，我们可以照着现在的 Resize 算子填。而要注意的是，我们现在希望 scales 参数是由输入动态决定的。因此，在填入 ONNX 的 scales 时，我们要把 symbolic 方法的输入参数中的 scales 填入。

接着，让我们把新模型导出成 ONNX 模型：

x = torch.randn(1, 3, 256, 256) 
 
with torch.no_grad(): 
    torch.onnx.export(model, (x, factor), 
                      "srcnn3.onnx", 
                      opset_version=11, 
                      input_names=['input', 'factor'], 
                      output_names=['output'])

把导出的 " srcnn3.onnx " 进行可视化：

在这里插入图片描述

可以看到，正如我们所期望的，导出的 ONNX 模型有了两个输入！第二个输入表示图像的放缩比例。

之前在验证 PyTorch 模型和导出 ONNX 模型时，我们宽高的缩放比例设置成了 3x3。现在，在用 ONNX Runtime 推理时，我们尝试使用 4x4 的缩放比例：

import cv2
import onnxruntime
import numpy as np


input_img = cv2.imread('face.png').astype(np.float32)
input_img = np.transpose(input_img, [2, 0, 1])
input_img = np.expand_dims(input_img, 0)

input_factor = np.array([1, 1, 4, 4], dtype=np.float32)
ort_session = onnxruntime.InferenceSession("srcnn3.onnx")
ort_inputs = {'input': input_img, 'factor': input_factor}
ort_output = ort_session.run(None, ort_inputs)[0]

ort_output = np.squeeze(ort_output, 0)
ort_output = np.clip(ort_output, 0, 255)
ort_output = np.transpose(ort_output, [1, 2, 0]).astype(np.uint8)
cv2.imwrite("face_ort_3.png", ort_output)

运行上面的代码，可以得到一个边长放大4倍的超分辨率图片 “face_ort_3.png”。动态的超分辨率模型生成成功了！只要修改 input_factor，我们就可以自由地控制图片的缩放比例。

我们刚刚的工作，实际上是绕过 PyTorch 本身的限制，凭空“捏”出了一个 ONNX 算子。

事实上，我们不仅可以创建现有的 ONNX 算子，还可以定义新的 ONNX 算子以拓展 ONNX 的表达能力。

总结

模型部署中常见的几类困难有：
- 模型的动态化；
- 新算子的实现；
- 框架间的兼容。
PyTorch 转 ONNX，实际上就是把每一个操作转化成 ONNX 定义的某一个算子。比如对于 PyTorch 中的 Upsample 和 interpolate，在转 ONNX 后最终都会成为 ONNX 的 Resize 算子。
通过修改继承自 torch.autograd.Function 的算子的 symbolic 方法，可以改变该算子映射到 ONNX 算子的行为。