《Deep-Learning-with-PyTorch》学习笔记Chapter15 Deployment（三）

最新推荐文章于 2023-10-20 11:15:03 发布

LearnerzzZ

最新推荐文章于 2023-10-20 11:15:03 发布

阅读量241

点赞数 1

分类专栏：深度学习机器学习文章标签：机器学习深度学习神经网络

本文链接：https://blog.csdn.net/LearnerzzZ/article/details/115751524

版权

深度学习同时被 2 个专栏收录

9 篇文章 1 订阅

订阅专栏

机器学习

7 篇文章 0 订阅

订阅专栏

（自学《Deep-Learning-with-PyTorch》使用，仅供参考）

【Exporting models】

在整个部署过程中，我们可能还会碰到一些问题，比如GIL有可能会阻塞我们改进后的web服务器，或者在一些要求比较特殊（Python太贵或者不可用）的嵌入式系统上运行等等。

【补充GIL（全局解释器锁）】：

{在解释器中执行的每一个 Python 线程，都会先锁住自己以阻止别的线程执行，然而，解释器不能让一个线程一直独占线路运行，因此它会轮流交替执行 Python线程，让用户觉得整个过程是并行的。}

因此我们提出了几个方法：

1.完全离开Pytorch，使用更专门的框架

2.留在Pytorch生态系统中，使用JIT（一种及时的编译器，用于以Pytorch为中心的Python子集，而且当我们在Python中运行JITed模型时，我们会追求它的两个优点：①有时JIT会实现很好的优化，②JITed的模型能够帮助避免对web服务器的GIL）

3.在libtorch下运行模型，或者使用派生的Torch Mobile

【Interoperability beyond PyTorch with ONNX】

ONNX（开放神经网络交换）：ONNX为神经网络和机器学习模型提供了一种互操作格式，一旦导出，模型就可以被任何与ONNX兼容的运行所执行，前提是ONNX标准和目标运行支持我们模型的操作。除了传统的硬件，很多专业的AI加速器硬件都支持ONNX。

在某种程度上，深度学习的模型是一个带有非常具体的指令集的程序，由矩阵乘法、卷积、relu、tanh等粒度操作组成。因此，如果我们能够序列化计算，那么我们就可以在另一个运行中重新执行它，而这个运行明白这个模型的低级操作。ONNX就是描述这些操作及其参数的标准化格式。

为了将一个模型导出到ONNX，我们需要在一个虚拟的输入下运行一个模型，输入张量的值不重要，重要的是它们是正确的形状和类型。

通过调用torch.onnx.export函数，Pytorch将跟踪模型执行的计算，并以提供的名称将它们序列化到一个onnx文件中：

torch.onnx.export(seg_model, dummy_input, "seg_model.onnx")

生成的ONNX文件现在可以在runtime中运行、编译到边缘设备或者上传到云服务。而且，它可以在安装了onnxruntime或onnxruntime-gpu并以NumPy数组形式获取批处理的Python中使用。

【注意】：
并不是所有的TorchScript操作符都可以用标准化的ONNX操作符来表示。

【PyTorch’s own export: Tracing】

当互操作性不是主要的，但我们又需要避免Python的GIL或者要用其他方式导出我们的模型时，我们可以使用Pytorch自己的方式：Torch-Script。

制作TorchScript最简单的方式就是跟踪它，这个看起来很像ONNX导出。

可以使用torch.jit.trace函数将虚拟输入输入到模型中。

在我们跟踪模型之前，有一个额外的警告:所有参数都不应该要求梯度，因为使用torch.no_grad()环境管理器严格来说是一个runtime切换。（由于我们在跟踪模型之后，会要求PyTorch执行它，所以即使我们在no_grad中跟踪模型，然后在外面运行它，PyTorch也会记录梯度。跟踪的模型在执行记录操作时将具有需要梯度的参数，并且它们将使所有东西都需要梯度，因此会在torch.no_grad的环境下跟踪模型）

为了避免这个现象（这个现象会导致性能变差并且我们经常忘记），我们就应该循环遍历模型参数，并将它们设置为不需要梯度。然后我们需要做到就是调用torch.jit.trace。

我们可以通过下式保存追踪的模型：

torch.jit.save(traced_seg_model, 'traced_seg_model.pt')

可以通过下式仅由保存的文件就实现加载模型：

loaded_model = torch.jit.load('traced_seg_model.pt')
prediction = loaded_model(batch)

PyTorch JIT会在我们保存模型时保持模型的状态:我们已经将它置于评估模式，而且我们的参数不需要梯度。如果我们之前没有注意到这一点，我们将需要在执行过程中使用torch.no_grad()：。

【提示】：

你可以运行经过JITed和导出的PyTorch模型，而不需要保留源代码。然而，我们总是希望建立一个工作流，在那里我们可以自动地从源模型到已安装的JITed模型进行部署。如果我们不这样做，我们会发现我们想要用模型来调整一些东西，但却失去了修改和再生的能力。请永远保持消息来源!

【Our server with a traced model】

接下来，我们要将我们的web服务器迭代为最终版本。

我们只需要在我们的服务器中用torch.jit.load替换get_pretrained_model来调用模型。这意味着我们的模型独立于GIL运行——这也是我们希望我们的服务器在这里实现的。

【request_batching_jit_server.py】：

import sys
import asyncio
import itertools
import functools
from sanic import Sanic
from sanic.response import  json, text
from sanic.log import logger
from sanic.exceptions import ServerError

import sanic
import threading
import PIL.Image
import io
import torch
import torchvision

app = Sanic(__name__)

device = torch.device('cpu')
# we only run 1 inference run at any time (one could schedule between several runners if desired)
MAX_QUEUE_SIZE = 3  # we accept a backlog of MAX_QUEUE_SIZE before handing out "Too busy" errors
MAX_BATCH_SIZE = 2  # we put at most MAX_BATCH_SIZE things in a single batch
MAX_WAIT = 1        # we wait at most MAX_WAIT seconds before running for more inputs to arrive in batching

class HandlingError(Exception):
    def __init__(self, msg, code=500):
        super().__init__()
        self.handling_code = code
        self.handling_msg = msg

class ModelRunner:
    def __init__(self, model_name):
        self.model_name = model_name
        self.queue = []
        self.queue_lock = None
        self.model = torch.jit.load(self.model_name, map_location=device)
        self.needs_processing = None
        self.needs_processing_timer = None

    def schedule_processing_if_needed(self):
        if len(self.queue) >= MAX_BATCH_SIZE:
            logger.debug("next batch ready when processing a batch")
            self.needs_processing.set()
        elif self.queue:
            logger.debug("queue nonempty when processing a batch, setting next timer")
            self.needs_processing_timer = app.loop.call_at(self.queue[0]["time"] + MAX_WAIT, self.needs_processing.set)

    async def process_input(self, input):
        our_task = {"done_event": asyncio.Event(loop=app.loop),
                    "input": input,
                    "time": app.loop.time()}
        async with self.queue_lock:
            if len(self.queue) >= MAX_QUEUE_SIZE:
                raise HandlingError("I'm too busy", code=503)
            self.queue.append(our_task)
            logger.debug("enqueued task. new queue size {}".format(len(self.queue)))
            self.schedule_processing_if_needed()
        await our_task["done_event"].wait()
        return our_task["output"]

    def run_model(self, batch):  # runs in other thread
        return self.model(batch.to(device)).to('cpu')

    async def model_runner(self):
        self.queue_lock = asyncio.Lock(loop=app.loop)
        self.needs_processing = asyncio.Event(loop=app.loop)
        logger.info("started model runner for {}".format(self.model_name))
        while True:
            await self.needs_processing.wait()
            self.needs_processing.clear()
            if self.needs_processing_timer is not None:
                self.needs_processing_timer.cancel()
                self.needs_processing_timer = None
            async with self.queue_lock:
                if self.queue:
                    longest_wait = app.loop.time() - self.queue[0]["time"]
                else:  # oops
                    longest_wait = None
                logger.debug("launching processing. queue size: {}. longest wait: {}".format(len(self.queue), longest_wait))
                to_process = self.queue[:MAX_BATCH_SIZE]
                del self.queue[:len(to_process)]
                self.schedule_processing_if_needed()
            # so here we copy, it would be neater to avoid this
            batch = torch.stack([t["input"] for t in to_process], dim=0)
            # we could delete inputs here...

            result = await app.loop.run_in_executor(
                None, functools.partial(self.run_model, batch)
            )
            for t, r in zip(to_process, result):
                t["output"] = r
                t["done_event"].set()
            del to_process

style_transfer_runner = ModelRunner(sys.argv[1])

@app.route('/image', methods=['PUT'], stream=True)
async def image(request):
    try:
        print (request.headers)
        content_length = int(request.headers.get('content-length', '0'))
        MAX_SIZE = 2**22 # 10MB
        if content_length:
            if content_length > MAX_SIZE:
                raise HandlingError("Too large")
            data = bytearray(content_length)
        else:
            data = bytearray(MAX_SIZE)
        pos = 0
        while True:
            # so this still copies too much stuff.
            data_part = await request.stream.read()
            if data_part is None:
                break
            data[pos: len(data_part) + pos] = data_part
            pos += len(data_part)
            if pos > MAX_SIZE:
                raise HandlingError("Too large")

        # ideally, we would minimize preprocessing...
        im = PIL.Image.open(io.BytesIO(data))
        im = torchvision.transforms.functional.resize(im, (228, 228))
        im = torchvision.transforms.functional.to_tensor(im)
        im = im[:3]  # drop alpha channel if present
        if im.dim() != 3 or im.size(0) < 3 or im.size(0) > 4:
            raise HandlingError("need rgb image")
        out_im = await style_transfer_runner.process_input(im)
        out_im = torchvision.transforms.functional.to_pil_image(out_im)
        imgByteArr = io.BytesIO()
        out_im.save(imgByteArr, format='JPEG')
        return sanic.response.raw(imgByteArr.getvalue(), status=200,
                                  content_type='image/jpeg')
    except HandlingError as e:
        # we don't want these to be logged...
        return sanic.response.text(e.handling_msg, status=e.handling_code)

app.add_task(style_transfer_runner.model_runner())
app.run(host="0.0.0.0", port=8000,debug=True)

LearnerzzZ

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
2
评论
《Deep-Learning-with-PyTorch》学习笔记Chapter15 Deployment（三）

（自学《Deep-Learning-with-PyTorch》使用，仅供参考）【Exporting models】在整个部署过程中，我们可能还会碰到一些问题，比如GIL有可能会阻塞我们改进后的web服务器，或者在一些要求比较特殊（Python太贵或者不可用）的嵌入式系统上运行等等。【补充GIL（全局解释器锁）】：{在解释器中执行的每一个 Python 线程，都会先锁住自己以阻止别的线程执行，然而，解释器不能让一个线程一直独占线路运行，因此它会轮流交替执行 Python线程，让用户觉得整个过程
复制链接

扫一扫