编写自定义算法的torchserve简易教程

qq_34685188

已于 2022-10-24 10:51:16 修改

阅读量919

点赞数 4

文章标签： python 开发语言 1024程序员节

于 2022-09-27 18:32:31 首次发布

本文链接：https://blog.csdn.net/qq_34685188/article/details/127069894

版权

这里写自定义目录标题

0 Torchserve
1 Torchserve使用教程
2 torchserve定制自己的算法模型
3 多模型部署教程
更多细节参见

0 Torchserve

TorchServe 是一种高性能、灵活且易于使用的工具，用于为 pytorch eager 模式和 torch script模型提供服务。

1 Torchserve使用教程

三种配置torchserve方式

1.1 环境变量

通过修改以下环境变量文件
JAVA_HOME
PYTHONPATH
TS_CONFIG_FILE
LOG_LOCATION
METRICS_LOCATION
此为最高级配置属性

1.2 命令行参数

使用以下命令自定义torchserve参数
–ts-config 如果未设置环境变量，TorchServe 会加载指定的配置文件TS_CONFIG_FILE

–model-store 覆盖 config.properties filemodel_store 中的属性

–models Overrides 覆盖 config.propertiesload_models 中的属性

–log-config 覆盖默认的 log4j2.xml

–foreground 在前台运行 TorchServe。如果禁用此选项，TorchServe 将在后台运行

1.3 配置文件

TorchServe 使用文件来存储配置。 TorchServe 使用以下按优先级顺序查找此文件：config.propertiesconfig.properties

如果设置了环境变量，TorchServe 从环境变量指定的路径加载配置。TS_CONFIG_FILE

如果将参数传递给，则 TorchServe 从参数指定的路径加载配置。–ts-configtorchserve

如果您调用的文件夹中有一个，TorchServe 会从当前工作目录加载该文件。config.propertiestorchserveconfig.properties

如果以上都没有指定，TorchServe 会加载一个带有默认值的内置配置。

2 torchserve定制自己的算法模型

2.1 自定义程序流程

通过编写 Python 脚本来自定义 TorchServe 的行为，在使用模型归档器时，该脚本与模型一起打包。pytorch serve在运行时执行此代码。

torchserve流程：

1、初始化模型实例

	def initialize(self, context)

2、在将输入数据发送到模型进行推理或 Captum 解释之前，对其进行预处理

    def preprocess(self, data):

3、自定义调用模型以进行推理或解释

    def inference(self, model_input):

4、得到模型的返回响应后，对模型的输出进行后处理（一般都需要修改）

    def postprocess(self, inference_output):

2.2 请求基本参数

data - 来自请求的需要处理的输入数据

context - 是pytorch serve的配置信息。可以定义以下信息： customization model_name, model_dir, manifest, batch_size, gpu等

2.3 从基础的BaseHandler开始

基础的BaseHandler已经完成大部分功能，大部分情况下，只需要覆盖preprocess和postprocess方法即可完成。
加载模型参数

	#初始化模型权重
    def initialize(self, context):
        """First try to load torchscript else load eager mode state_dict based model"""

        properties = context.system_properties
        self.map_location = "cuda" if torch.cuda.is_available() and properties.get("gpu_id") is not None else "cpu"
        self.device = torch.device(
            self.map_location + ":" + str(properties.get("gpu_id"))
            if torch.cuda.is_available() and properties.get("gpu_id") is not None
            else self.map_location
        )
        self.manifest = context.manifest

        model_dir = properties.get("model_dir")
        serialized_file = self.manifest["model"]["serializedFile"]
        model_pt_path = os.path.join(model_dir, serialized_file)

        if not os.path.isfile(model_pt_path):
            raise RuntimeError("Missing the model.pt file")

        # model def file
        model_file = self.manifest["model"].get("modelFile", "")

        if model_file:
            logger.debug("Loading eager model")
            self.model = self._load_pickled_model(model_dir, model_file, model_pt_path)
        else:
            logger.debug("Loading torchscript model")
            self.model = self._load_torchscript_model(model_pt_path)

        self.model.to(self.device)
        self.model.eval()

        logger.debug("Model file %s loaded successfully", model_pt_path)

        self.initialized = True

2.4 高阶自定义handler程序

2.4.1 返回自定义错误的代码.module级别

from ts.utils.util import PredictionException
def handle(data, context):
    # Some unexpected error - returning error code 513
    raise PredictionException("Some Prediction Error", 513)

2.4.2 返回自定义错误的代码.class级别

from ts.torch_handler.base_handler import BaseHandler
from ts.utils.util import PredictionException

class ModelHandler(BaseHandler):
    """
    A custom model handler implementation.
    """

    def handle(self, data, context):
        # Some unexpected error - returning error code 513
        raise PredictionException("Some Prediction Error", 513)

2.5 从BaseHandler开始编写一个完整的自定义handler程序

# custom handler file

# model_handler.py

"""
ModelHandler defines a custom model handler.
"""

from ts.torch_handler.base_handler import BaseHandler

class ModelHandler(BaseHandler):
    """
    A custom model handler implementation.
    """

    def __init__(self):
        self._context = None
        self.initialized = False
        self.explain = False
        self.target = 0

    def initialize(self, context):
        """
        Initialize model. This will be called during model loading time
        :param context: Initial context contains model server system properties.
        :return:
        """
        self._context = context
        self.initialized = True
        #  load the model, refer 'custom handler class' above for details

    def preprocess(self, data):
        """
        Transform raw input into model input data.
        :param batch: list of raw requests, should match batch size
        :return: list of preprocessed model input data
        """
        # Take the input data and make it inference ready
        preprocessed_data = data[0].get("data")
        if preprocessed_data is None:
            preprocessed_data = data[0].get("body")

        return preprocessed_data


    def inference(self, model_input):
        """
        Internal inference methods
        :param model_input: transformed model input data
        :return: list of inference output in NDArray
        """
        # Do some inference call to engine here and return output
        model_output = self.model.forward(model_input)
        return model_output

    def postprocess(self, inference_output):
        """
        Return inference result.
        :param inference_output: list of inference output
        :return: list of predict results
        """
        # Take output from network and post-process to desired format
        postprocess_output = inference_output
        return postprocess_output

    def handle(self, data, context):
        """
        Invoke by TorchServe for prediction request.
        Do pre-processing of data, prediction using model and postprocessing of prediciton output
        :param data: Input data for prediction
        :param context: Initial context contains model server system properties.
        :return: prediction output
        """
        model_input = self.preprocess(data)
        model_output = self.inference(model_input)
        return self.postprocess(model_output)

ModelHandler继承BaseHandler，核心方法是def handler( )。

2.5.1 Init方法初始化handler参数

    def __init__(self):
        self._context = None
        self.initialized = False
        self.explain = False
        self.target = 0

2.5.2 initialize方法定义模型加载权重

    def initialize(self, context):
        """First try to load torchscript else load eager mode state_dict based model"""

        properties = context.system_properties
        self.map_location = "cuda" if torch.cuda.is_available() and properties.get("gpu_id") is not None else "cpu"
        self.device = torch.device(
            self.map_location + ":" + str(properties.get("gpu_id"))
            if torch.cuda.is_available() and properties.get("gpu_id") is not None
            else self.map_location
        )
        self.manifest = context.manifest

        model_dir = properties.get("model_dir")
        serialized_file = self.manifest["model"]["serializedFile"]
        model_pt_path = os.path.join(model_dir, serialized_file)

        if not os.path.isfile(model_pt_path):
            raise RuntimeError("Missing the model.pt file")

        # model def file
        model_file = self.manifest["model"].get("modelFile", "")

        if model_file:
            logger.debug("Loading eager model")
            self.model = self._load_pickled_model(model_dir, model_file, model_pt_path)
        else:
            logger.debug("Loading torchscript model")
            self.model = self._load_torchscript_model(model_pt_path)

        self.model.to(self.device)
        self.model.eval()

        logger.debug("Model file %s loaded successfully", model_pt_path)

        self.initialized = True

torch.jit.trace

torch.jit.trace() 把训练后得到的 eager 模型以及模型需要的输入数据作为接口输入，然后 tracer 会把数据在 eager 模型里运行一次，并且记录执行的 tensor 操作，记录的结果会保存成一个 TorchScript 模块。

但是它的主要缺点就是不支持控制流，数据结构（list，dict 等）和 python 结构，并且可能部分操作没有正确的被记录在 TorchScript 模块中，但是不会给任何警示信息，不能保证输出的一定是正确的 TorchScript 模块。

加载torchscript模式脚本(推荐使用，可以省略model.py文件)

    def _load_torchscript_model(self, model_pt_path):
        return torch.jit.load(model_pt_path, map_location=self.map_location)

torch.jit.script

torch.jit.script 用作装饰器可以将你的代码转化成写成 TorchScript 语言，它转化出来的模型更冗长（携带更多的信息），但是更通用，经过些许修改就可以支持大部分的 PyTorch 模型。也可以用作接口，直接将 eager 模型送入torch.jit.script()，无需再送入数据。它支持控制流以及一些 Python 的数据结构。但是它会省略常量节点，并需要类型转换，如果没有类型提供则默认是 Tensor 类型。

加载eager模型

    def _load_pickled_model(self, model_dir, model_file, model_pt_path):
        model_def_path = os.path.join(model_dir, model_file)
        if not os.path.isfile(model_def_path):
            raise RuntimeError("Missing the model.py file")

        module = importlib.import_module(model_file.split(".")[0])
        model_class_definitions = list_classes_from_module(module)
        if len(model_class_definitions) != 1:
            raise ValueError(
                "Expected only one class as model definition. {}".format(
                    model_class_definitions
                )
            )
        model_class = model_class_definitions[0]
        state_dict = torch.load(model_pt_path, map_location=self.map_location)
        model = model_class()
        model.load_state_dict(state_dict)
        return model

2.5.3 preprocess方法对图片信息预处理

预处理通常是对加载数据做基础变化，如toTensor、Resize和Normalize等。

    image_processing = transforms.Compose(
        [
            transforms.ToTensor(),
            transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        ]
    )
    def preprocess(self, data):
        """The preprocess function of MODNet program converts the input data to a float tensor
        Args:
            data (List): Input data from the request is in the form of a Tensor

        Returns:
            list : The preprocess function returns the input image as a list of float tensors.
        """
        images = []

        for row in data:
            # Compat layer: normally the envelope should just return the data
            # directly, but older versions of Torchserve didn't have envelope.
            image = row.get("data") or row.get("body")
            if isinstance(image, str):
                # if the image is a string of bytesarray.
                image = base64.b64decode(image)

            # If the image is sent as bytesarray
            if isinstance(image, (bytearray, bytes)):
                image = Image.open(io.BytesIO(image))
                image = self.image_processing(image)
            else:
                # if the image is a list
                image = torch.FloatTensor(image)

            images.append(image)

        return torch.stack(images).to(self.device)

2.5.4 inference方法使用模型得到计算结果

通常模型在这进行计算，这里为MODNet网络的示范。为了方便得到原始的图片size，所以在此处进行interpolate。

    def inference(self, data, *args, **kwargs):
        """
        The Inference Function is used to make a prediction call on the given input request.
        The user needs to override the inference function to customize it.

        Args:
            data (Torch Tensor): A Torch Tensor is passed to make the Inference Request.
            The shape should match the model input shape.

        Returns:
            Torch Tensor : The Predicted Torch Tensor is returned in this function.
        """
        ref_size = 512

        # add mini-batch dim
        # resize data for input

        im_b, im_c, im_h, im_w = data.shape
        if max(im_h, im_w) < ref_size or min(im_h, im_w) > ref_size:
            if im_w >= im_h:
                im_rh = ref_size
                im_rw = int(im_w / im_h * ref_size)
            elif im_w < im_h:
                im_rw = ref_size
                im_rh = int(im_h / im_w * ref_size)
        else:
            im_rh = im_h
            im_rw = im_w

        im_rw = im_rw - im_rw % 32
        im_rh = im_rh - im_rh % 32
        data = F.interpolate(data, size=(im_rh, im_rw), mode='area')

        marshalled_data = data.to(self.device)
        marshalled_data = marshalled_data * 2 - 1
        with torch.no_grad():
            results = self.model(marshalled_data, *args, **kwargs)
        results = F.interpolate(results, size=(im_h, im_w), mode='area')
        results = results[0][0].data.cpu().numpy()
        return results

2.5.5 postprocess后处理方法

后处理函数一般用来处理模型计算结果，并且以torchserve能够返回的格式返回，这里需要返回一个list类型数据，且torchserve返回类型只支持json格式，需要一个变量名对应此数据。

    def postprocess(self, data):
        """
        Create an image(jpeg) using the output tensor.
        """
        logging.info(
            "Successfully model process done"
        )
        save_time = time.time()
        print('/home/abc/deep_learning/serve-master2/pic/' + str(save_time) + "_fore.png")
        cv2.imwrite('/home/abc/deep_learning/serve-master2/pic/' + str(save_time) + "_fore.png", cv2.merge(data))

        logging.info(
            "Successfully save data"
        )
        return data.tolist()

其中要注意torchserve返回的参数要是json格式，且内部为list

2.6 torch-model-archiver生成模型存档mar文件

使用model-archiver tool创建 TorchServe 可以使用的模型存档。

torch-model-archiver \
	--model-name <model-name> \
	--version <model_version_number> \
	--handler model_handler[:<entry_point_function_name>] \
	[--model-file <path_to_model_architecture_file>] \
	--serialized-file <path_to_state_dict_file> \
	[--extra-files <comma_seperarted_additional_files>] \
	[--export-path <output-dir> \
	--model-path <model_dir>] \
	[--runtime python3] \

其中[ ]是可选择项,('\'与之前函数名要相距一个空格，容易犯的低级错误)

使用案例

torch-model-archiver -f \
	--model-name modnet \
	--version 1.0 \
	--serialized-file modnet.pt \
	--export-path model_store \
	--handler /home/abc/deep_learning/serve-master2/examples/modnet/modnet_handler.py

常用属性介绍
–model-name 生成的模型名字
–version 版本号
–serialized-file torchscript或eager模式权重[模型文件名]
–export-path 导出路径
–handler handler算法处理py文件

启动torchserve命令

torchserve --start \
	--model-store model_store \
	--models modnet=modnet.mar \
	--no-config-snapshots \

–no-config-snapshots 不启动快照
–model-store model存储库
–models model名=生成的model.mar

多GPU模型的使用

多个GPU情况下，torchserve以轮询方式访问GPU设置，以下是设置多GPU代码

import torch

class ModelHandler(object):
    """
    A base Model handler implementation.
    """

    def __init__(self):
        self.device = None

    def initialize(self, context):
        properties = context.system_properties
        self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")

3 多模型部署教程

workflow规范是一个 YAML 文件，它提供了要执行的模型的详细信息和一个用于定义数据流的 DAG。

YAML 文件分为几个部分

model包含全局模型参数的模型

m1,m2,m3 将覆盖全局模型参数的所有相关模型参数

dag 描述工作流的结构，节点流动关系

models:
    #global model params
    min-workers: 1
    max-workers: 4
    batch-size: 3
    max-batch-delay : 5000
    retry-attempts : 3
    timeout-ms : 5000
    m1:
       url : model1.mar #local or public URI
       min-workers: 1   #override the global params
       max-workers: 2
       batch-size: 4
     
    m2:
       url : model2.mar

    m3:
       url : model3.mar
       batch-size: 3

    m4:
      url : model4.mar
 
dag:
  pre_processing : [m1]
  m1 : [m2]
  m2 : [m3]
  m3 : [m4]
  m4 : [postprocessing]

yaml文件的处理流程如下

                          model1
                         /       \
input -> preprocessing ->         -> aggregate_func
                         \       /
                          model2

多模型使用一般只需添加yaml文件和preprocess和postprocess函数即可
案例如下

def preprocess(data, context):
   pass

def postprocess(data, context):
   pass

生成war文件案例

torch-workflow-archiver  \
  -f \
  --workflow-name nmt_wf_dual \
  --spec-file nmt_workflow_dualtranslation.yaml  \
  --handler nmt_workflow_handler_dualtranslation.py \
  --export-path wf_store/

-f 强制覆盖上一个war文件
–spec-file yaml文件路径
–handler handler 文件路径
–export-path 导出路径