这里写自定义目录标题
0 Torchserve
TorchServe 是一种高性能、灵活且易于使用的工具,用于为 pytorch eager 模式和 torch script模型提供服务。
1 Torchserve使用教程
三种配置torchserve方式
1.1 环境变量
通过修改以下环境变量文件
JAVA_HOME
PYTHONPATH
TS_CONFIG_FILE
LOG_LOCATION
METRICS_LOCATION
此为最高级配置属性
1.2 命令行参数
使用以下命令自定义torchserve参数
–ts-config 如果未设置环境变量,TorchServe 会加载指定的配置文件TS_CONFIG_FILE
–model-store 覆盖 config.properties filemodel_store 中的属性
–models Overrides 覆盖 config.propertiesload_models 中的属性
–log-config 覆盖默认的 log4j2.xml
–foreground 在前台运行 TorchServe。 如果禁用此选项,TorchServe 将在后台运行
1.3 配置文件
TorchServe 使用文件来存储配置。 TorchServe 使用以下按优先级顺序查找此文件:config.propertiesconfig.properties
如果设置了环境变量,TorchServe 从环境变量指定的路径加载配置。TS_CONFIG_FILE
如果将参数传递给 ,则 TorchServe 从参数指定的路径加载配置。–ts-configtorchserve
如果您调用的文件夹中有一个,TorchServe 会从当前工作目录加载该文件。config.propertiestorchserveconfig.properties
如果以上都没有指定,TorchServe 会加载一个带有默认值的内置配置。
2 torchserve定制自己的算法模型
2.1 自定义程序流程
通过编写 Python 脚本来自定义 TorchServe 的行为,在使用模型归档器时,该脚本与模型一起打包。pytorch serve在运行时执行此代码。
torchserve流程:
1、初始化模型实例
def initialize(self, context)
2、在将输入数据发送到模型进行推理或 Captum 解释之前,对其进行预处理
def preprocess(self, data):
3、自定义调用模型以进行推理或解释
def inference(self, model_input):
4、得到模型的返回响应后,对模型的输出进行后处理(一般都需要修改)
def postprocess(self, inference_output):
2.2 请求基本参数
data - 来自请求的需要处理的输入数据
context - 是pytorch serve的配置信息。可以定义以下信息: customization model_name, model_dir, manifest, batch_size, gpu等
2.3 从基础的BaseHandler开始
基础的BaseHandler已经完成大部分功能,大部分情况下,只需要覆盖preprocess和postprocess方法即可完成。
加载模型参数
#初始化模型权重
def initialize(self, context):
"""First try to load torchscript else load eager mode state_dict based model"""
properties = context.system_properties
self.map_location = "cuda" if torch.cuda.is_available() and properties.get("gpu_id") is not None else "cpu"
self.device = torch.device(
self.map_location + ":" + str(properties.get("gpu_id"))
if torch.cuda.is_available() and properties.get("gpu_id") is not None
else self.map_location
)
self.manifest = context.manifest
model_dir = properties.get("model_dir")
serialized_file = self.manifest["model"]["serializedFile"]
model_pt_path = os.path.join(model_dir, serialized_file)
if not os.path.isfile(model_pt_path):
raise RuntimeError("Missing the model.pt file")
# model def file
model_file = self.manifest["model"].get("modelFile", "")
if model_file:
logger.debug("Loading eager model")
self.model = self._load_pickled_model(model_dir, model_file, model_pt_path)
else:
logger.debug("Loading torchscript model")
self.model = self._load_torchscript_model(model_pt_path)
self.model.to(self.device)
self.model.eval()
logger.debug("Model file %s loaded successfully", model_pt_path)
self.initialized = True
2.4 高阶自定义handler程序
2.4.1 返回自定义错误的代码.module级别
from ts.utils.util import PredictionException
def handle(data, context):
# Some unexpected error - returning error code 513
raise PredictionException("Some Prediction Error", 513)
2.4.2 返回自定义错误的代码.class级别
from ts.torch_handler.base_handler import BaseHandler
from ts.utils.util import PredictionException
class ModelHandler(BaseHandler):
"""
A custom model handler implementation.
"""
def handle(self, data, context):
# Some unexpected error - returning error code 513
raise PredictionException("Some Prediction Error", 513)
2.5 从BaseHandler开始编写一个完整的自定义handler程序
# custom handler file
# model_handler.py
"""
ModelHandler defines a custom model handler.
"""
from ts.torch_handler.base_handler import BaseHandler
class ModelHandler(BaseHandler):
"""
A custom model handler implementation.
"""
def __init__(self):
self._context = None
self.initialized = False
self.explain = False
self.target = 0
def initialize(self, context):
"""
Initialize model. This will be called during model loading time
:param context: Initial context contains model server system properties.
:return:
"""
self._context = context
self.initialized = True
# load the model, refer 'custom handler class' above for details
def preprocess(self, data):
"""
Transform raw input into model input data.
:param batch: list of raw requests, should match batch size
:return: list of preprocessed model input data
"""
# Take the input data and make it inference ready
preprocessed_data = data[0].get("data")
if preprocessed_data is None:
preprocessed_data = data[0].get("body")
return preprocessed_data
def inference(self, model_input):
"""
Internal inference methods
:param model_input: transformed model input data
:return: list of inference output in NDArray
"""
# Do some inference call to engine here and return output
model_output = self.model.forward(model_input)
return model_output
def postprocess(self, inference_output):
"""
Return inference result.
:param inference_output: list of inference output
:return: list of predict results
"""
# Take output from network and post-process to desired format
postprocess_output = inference_output
return postprocess_output
def handle(self, data, context):
"""
Invoke by TorchServe for prediction request.
Do pre-processing of data, prediction using model and postprocessing of prediciton output
:param data: Input data for prediction
:param context: Initial context contains model server system properties.
:return: prediction output
"""
model_input = self.preprocess(data)
model_output = self.inference(model_input)
return self.postprocess(model_output)
ModelHandler继承BaseHandler,核心方法是def handler( )。
2.5.1 Init方法初始化handler参数
def __init__(self):
self._context = None
self.initialized = False
self.explain = False
self.target = 0
2.5.2 initialize方法定义模型加载权重
def initialize(self, context):
"""First try to load torchscript else load eager mode state_dict based model"""
properties = context.system_properties
self.map_location = "cuda" if torch.cuda.is_available() and properties.get("gpu_id") is not None else "cpu"
self.device = torch.device(
self.map_location + ":" + str(properties.get("gpu_id"))
if torch.cuda.is_available() and properties.get("gpu_id") is not None
else self.map_location
)
self.manifest = context.manifest
model_dir = properties.get("model_dir")
serialized_file = self.manifest["model"]["serializedFile"]
model_pt_path = os.path.join(model_dir, serialized_file)
if not os.path.isfile(model_pt_path):
raise RuntimeError("Missing the model.pt file")
# model def file
model_file = self.manifest["model"].get("modelFile", "")
if model_file:
logger.debug("Loading eager model")
self.model = self._load_pickled_model(model_dir, model_file, model_pt_path)
else:
logger.debug("Loading torchscript model")
self.model = self._load_torchscript_model(model_pt_path)
self.model.to(self.device)
self.model.eval()
logger.debug("Model file %s loaded successfully", model_pt_path)
self.initialized = True
torch.jit.trace
torch.jit.trace() 把训练后得到的 eager 模型以及模型需要的输入数据作为接口输入,然后 tracer 会把数据在 eager 模型里运行一次,并且记录执行的 tensor 操作,记录的结果会保存成一个 TorchScript 模块。
但是它的主要缺点就是不支持控制流,数据结构(list,dict 等)和 python 结构,并且可能部分操作没有正确的被记录在 TorchScript 模块中,但是不会给任何警示信息,不能保证输出的一定是正确的 TorchScript 模块。
加载torchscript模式脚本(推荐使用,可以省略model.py文件)
def _load_torchscript_model(self, model_pt_path):
return torch.jit.load(model_pt_path, map_location=self.map_location)
torch.jit.script
torch.jit.script 用作装饰器可以将你的代码转化成写成 TorchScript 语言,它转化出来的模型更冗长(携带更多的信息),但是更通用,经过些许修改就可以支持大部分的 PyTorch 模型。 也可以用作接口,直接将 eager 模型送入torch.jit.script(),无需再送入数据。它支持控制流以及一些 Python 的数据结构。但是它会省略常量节点,并需要类型转换,如果没有类型提供则默认是 Tensor 类型。
加载eager模型
def _load_pickled_model(self, model_dir, model_file, model_pt_path):
model_def_path = os.path.join(model_dir, model_file)
if not os.path.isfile(model_def_path):
raise RuntimeError("Missing the model.py file")
module = importlib.import_module(model_file.split(".")[0])
model_class_definitions = list_classes_from_module(module)
if len(model_class_definitions) != 1:
raise ValueError(
"Expected only one class as model definition. {}".format(
model_class_definitions
)
)
model_class = model_class_definitions[0]
state_dict = torch.load(model_pt_path, map_location=self.map_location)
model = model_class()
model.load_state_dict(state_dict)
return model
2.5.3 preprocess方法对图片信息预处理
预处理通常是对加载数据做基础变化,如toTensor、Resize和Normalize等。
image_processing = transforms.Compose(
[
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
]
)
def preprocess(self, data):
"""The preprocess function of MODNet program converts the input data to a float tensor
Args:
data (List): Input data from the request is in the form of a Tensor
Returns:
list : The preprocess function returns the input image as a list of float tensors.
"""
images = []
for row in data:
# Compat layer: normally the envelope should just return the data
# directly, but older versions of Torchserve didn't have envelope.
image = row.get("data") or row.get("body")
if isinstance(image, str):
# if the image is a string of bytesarray.
image = base64.b64decode(image)
# If the image is sent as bytesarray
if isinstance(image, (bytearray, bytes)):
image = Image.open(io.BytesIO(image))
image = self.image_processing(image)
else:
# if the image is a list
image = torch.FloatTensor(image)
images.append(image)
return torch.stack(images).to(self.device)
2.5.4 inference方法 使用模型得到计算结果
通常模型在这进行计算,这里为MODNet网络的示范。为了方便得到原始的图片size,所以在此处进行interpolate。
def inference(self, data, *args, **kwargs):
"""
The Inference Function is used to make a prediction call on the given input request.
The user needs to override the inference function to customize it.
Args:
data (Torch Tensor): A Torch Tensor is passed to make the Inference Request.
The shape should match the model input shape.
Returns:
Torch Tensor : The Predicted Torch Tensor is returned in this function.
"""
ref_size = 512
# add mini-batch dim
# resize data for input
im_b, im_c, im_h, im_w = data.shape
if max(im_h, im_w) < ref_size or min(im_h, im_w) > ref_size:
if im_w >= im_h:
im_rh = ref_size
im_rw = int(im_w / im_h * ref_size)
elif im_w < im_h:
im_rw = ref_size
im_rh = int(im_h / im_w * ref_size)
else:
im_rh = im_h
im_rw = im_w
im_rw = im_rw - im_rw % 32
im_rh = im_rh - im_rh % 32
data = F.interpolate(data, size=(im_rh, im_rw), mode='area')
marshalled_data = data.to(self.device)
marshalled_data = marshalled_data * 2 - 1
with torch.no_grad():
results = self.model(marshalled_data, *args, **kwargs)
results = F.interpolate(results, size=(im_h, im_w), mode='area')
results = results[0][0].data.cpu().numpy()
return results
2.5.5 postprocess后处理方法
后处理函数一般用来处理模型计算结果,并且以torchserve能够返回的格式返回,这里需要返回一个list类型数据,且torchserve返回类型只支持json格式,需要一个变量名对应此数据。
def postprocess(self, data):
"""
Create an image(jpeg) using the output tensor.
"""
logging.info(
"Successfully model process done"
)
save_time = time.time()
print('/home/abc/deep_learning/serve-master2/pic/' + str(save_time) + "_fore.png")
cv2.imwrite('/home/abc/deep_learning/serve-master2/pic/' + str(save_time) + "_fore.png", cv2.merge(data))
logging.info(
"Successfully save data"
)
return data.tolist()
其中要注意torchserve返回的参数要是json格式,且内部为list
2.6 torch-model-archiver生成模型存档mar文件
使用model-archiver tool创建 TorchServe 可以使用的模型存档。
torch-model-archiver \
--model-name <model-name> \
--version <model_version_number> \
--handler model_handler[:<entry_point_function_name>] \
[--model-file <path_to_model_architecture_file>] \
--serialized-file <path_to_state_dict_file> \
[--extra-files <comma_seperarted_additional_files>] \
[--export-path <output-dir> \
--model-path <model_dir>] \
[--runtime python3] \
其中[ ]是可选择项,('\'与之前函数名要相距一个空格,容易犯的低级错误)
使用案例
torch-model-archiver -f \
--model-name modnet \
--version 1.0 \
--serialized-file modnet.pt \
--export-path model_store \
--handler /home/abc/deep_learning/serve-master2/examples/modnet/modnet_handler.py
常用属性介绍
–model-name 生成的模型名字
–version 版本号
–serialized-file torchscript或eager模式权重[模型文件名]
–export-path 导出路径
–handler handler算法处理py文件
启动torchserve命令
torchserve --start \
--model-store model_store \
--models modnet=modnet.mar \
--no-config-snapshots \
–no-config-snapshots 不启动快照
–model-store model存储库
–models model名=生成的model.mar
多GPU模型的使用
多个GPU情况下,torchserve以轮询方式访问GPU设置,以下是设置多GPU代码
import torch
class ModelHandler(object):
"""
A base Model handler implementation.
"""
def __init__(self):
self.device = None
def initialize(self, context):
properties = context.system_properties
self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")
3 多模型部署教程
workflow规范是一个 YAML 文件,它提供了要执行的模型的详细信息和一个用于定义数据流的 DAG。
YAML 文件分为几个部分
model包含全局模型参数的模型
m1,m2,m3 将覆盖全局模型参数的所有相关模型参数
dag 描述工作流的结构,节点流动关系
models:
#global model params
min-workers: 1
max-workers: 4
batch-size: 3
max-batch-delay : 5000
retry-attempts : 3
timeout-ms : 5000
m1:
url : model1.mar #local or public URI
min-workers: 1 #override the global params
max-workers: 2
batch-size: 4
m2:
url : model2.mar
m3:
url : model3.mar
batch-size: 3
m4:
url : model4.mar
dag:
pre_processing : [m1]
m1 : [m2]
m2 : [m3]
m3 : [m4]
m4 : [postprocessing]
yaml文件的处理流程如下
model1
/ \
input -> preprocessing -> -> aggregate_func
\ /
model2
多模型使用一般只需添加yaml文件和preprocess和postprocess函数即可
案例如下
def preprocess(data, context):
pass
def postprocess(data, context):
pass
生成war文件案例
torch-workflow-archiver \
-f \
--workflow-name nmt_wf_dual \
--spec-file nmt_workflow_dualtranslation.yaml \
--handler nmt_workflow_handler_dualtranslation.py \
--export-path wf_store/
-f 强制覆盖上一个war文件
–spec-file yaml文件路径
–handler handler 文件路径
–export-path 导出路径
更多细节参见
其他更详细的设置信息可以查看官方torchserve文档
torchserve官方文档/