TensorRT Triton Inference Server and tritonclient on python_backend

server

  • The Triton backend for Python. The goal of Python backend is to let you serve models written in Python by Triton Inference Server without having to write any C++ code.

Model Config File

  • 每个Python Triton模型都必须提供一个描述文件config.pbtxt用于模型配置。必须将模型文件的backend 字段设置为python ,同时不应设置platform字段。文件结构如下
  • 单模块模型
models
└── add_sub
    ├── 1
    │   └── model.py
    └── config.pbtxt
  • 多模块模型
├─decoder
│  │  config.pbtxt
│  │
│  └─1
│          decoder.plan
│
├─encoder
│  │  config.pbtxt
│  │
│  └─1
│          encoder.plan
│
└─main_name
    │  config.pbtxt
    │
    └─1
          model.py
          用于model.py的其他文件

config.pbtxt的配置

        需要配置各个模块的config.pbtxt及model.py的config.pbtxt,子模块的可以参考添加链接描述添加链接描述,model.py的config.pbtxt的size可设置为-1,名称设置请阅读下面的部分。

model.py

关键字解释
class TritonPythonModel:你的 Python 模型必须使用相同的类名。每个 Python 模型创建的内容必须具有“TritonPythonModel”作为类名
def initialize(self, args):'初始化’仅在加载模型时调用一次。实现“初始化”功能是可选的。此功能允许用于初始化与此模型关联的任何状态的模型。
def execute(self, requests):‘execute’ 每当发出推理请求时都会调用函数。“执行”函数接收pb_utils列表。推理请求作为唯一论点。发出推理请求时调用此函数对于此模型。根据配料配置(例如动态批处理)使用,“请求”可能包含多个请求。每Python 模型,必须创建一个pb_utils。推理响应每个pb_utils。推理请求在“请求”中。如果出现错误,您可以在创建pb_utils时设置错误参数。推理响应参数

initialize

  • 在函数中,您被赋予 一个变量args , 是一个 Python 字典。键和此字典的值都是字符串。字典中的键及其说明:
key描述
model_config包含模型配置的 JSON 字符串
model_instance_kind包含模型实例类型的字符串self.device = “cpu” if args[“model_instance_kind”] == “CPU” else “cuda”
model_instance_device_id包含模型实例设备 ID 的字符串
model_repository模型存储库路径
model_version模型版本
model_name型号名称

execute

这是您希望实现模型的最通用方式,并且 要求函数为每个请求只返回一个响应。 这意味着在此模式下,函数必须返回长度与 相同的对象列表。工作 此模式下的流为:

  • execute函数接收一批pb_utils。推理请求作为 长度 N 数组。

  • 对pb_utils执行推理。推理请求并附加 相应的pb_utils。推理响应响应列表。

  • 返回响应列表。

返回的响应列表的长度必须为 N。列表中的每个元素都应该是相应元素的响应 元素。每个元素必须包含一个响应(响应可以是输出 张量或错误);元素不能为“无”。

获取用户输入值get_input_tensor_by_name
  • 客户端使用tritonclient.http.InferInput的 set_data_from_numpy设置namefrommainconfig的值(name form main_name-> config.pbtxt)
  • 后端使用pb_utils.get_input_tensor_by_name(request, "namefrommainconfig").as_numpy().tolist()取值
// https://github.com/triton-inference-server/python_backend/blob/main/src/resources/triton_python_backend_utils.py
def get_input_tensor_by_name(inference_request, name):
    """Find an input Tensor in the inference_request that has the given
    name
    Parameters
    ----------
    inference_request : InferenceRequest
        InferenceRequest object
    name : str
        name of the input Tensor object
    Returns
    -------
    Tensor
        The input Tensor with the specified name, or None if no
        input Tensor with this name exists
    """
  • in_0 = pb_utils.Tensor("INPUT__0", input_ids.numpy().astype(self.input0_dtype)) 名称INPUT__0在model_name='facenet’的文件夹配置文件中获取
pb_utils.InferenceRequest
  • 注意以下例子中的 inputs=[pb_utils.Tensor(self.Facenet_inputs[0], face_img.astype(np.float32))]
  • tritonclient.utils.InferenceServerException: Failed to process the request(s) for model instance ‘XXXXX’, message: TypeError: init(): incompatible constructor arguments. The following argument types are supported:
    1. c_python_backend_utils.InferenceRequest(request_id: str = ‘’, correlation_id: int = 0, inputs: List[triton::backend::python::PbTensor], requested_output_names: List[str], model_name: str, model_version: int = -1, flags: int = 0)
//https://www.cnblogs.com/zzk0/p/15535828.html
inference_request = pb_utils.InferenceRequest(
    model_name='facenet',
    requested_output_names=[self.Facenet_outputs[0]],
    inputs=[pb_utils.Tensor(self.Facenet_inputs[0], face_img.astype(np.float32))]
)
inference_response = inference_request.exec()
pre = utils.pb_tensor_to_numpy(pb_utils.get_output_tensor_by_name(inference_response, self.Facenet_outputs[0]))


#output = pb_utils.get_output_tensor_by_name(inference_response, "your_requested_output_names_from_config")
#out: torch.Tensor = torch.from_dlpack(output.to_dlpack())
必须采用pytorch的to_dlpack将GPU的内容放到共享内存中,再用from_dlpack把共享内存的内容转为pytorch的tensor。
output = pb_utils.get_output_tensor_by_name(inference_response, "your_requested_output_names_from_config")
out: torch.Tensor = torch.from_dlpack(output.to_dlpack())
  • torch.from_dlpack:https://pytorch.org/docs/stable/dlpack.html

  • pb_utils.Tensor.from_dlpack triton的变量转为pytorch的tensor有2种方法:

            input_ids = from_dlpack(in_0.to_dlpack())

            input_ids = torch.from_numpy(in_0.as_numpy()) ,采用to_dlpack和from_dlpack 具有更低的消耗。

获取模型输出get_output_tensor_by_name
//pb_utils.get_output_tensor_by_name(inference_response, self.Facenet_outputs[0])
//https://github.com/triton-inference-server/python_backend/blob/main/src/resources/triton_python_backend_utils.py
def get_output_tensor_by_name(inference_response, name):
    """Find an output Tensor in the inference_response that has the given
    name
    Parameters
    ----------
    inference_response : InferenceResponse
        InferenceResponse object
    name : str
        name of the output Tensor object
    Returns
    -------
    Tensor
        The output Tensor with the specified name, or None if no
        output Tensor with this name exists
    """
    output_tensors = inference_response.output_tensors()
    for output_tensor in output_tensors:
        if output_tensor.name() == name:
            return output_tensor

    return None

错误输出

inference_response = inference_request.exec()
if inference_response.has_error():
	print(inference_response.error().message())

client

  • pip install tritonclient[all] https://github.com/triton-inference-server/client

  • frpc参考 https://programtalk.com/python-more-examples/tritonclient.grpc.InferInput/

  • http参考 https://www.cnblogs.com/zzk0/p/15535828.html

import numpy as np
import tritonclient.http as httpclient


if __name__ == '__main__':
    triton_client = httpclient.InferenceServerClient(url='127.0.0.1:8000')

    inputs = []
    inputs.append(httpclient.InferInput('INPUT0', [4], "FP32"))
    inputs.append(httpclient.InferInput('INPUT1', [4], "FP32"))
    input_data0 = np.random.randn(4).astype(np.float32)
    input_data1 = np.random.randn(4).astype(np.float32)
    inputs[0].set_data_from_numpy(input_data0, binary_data=False)
    inputs[1].set_data_from_numpy(input_data1, binary_data=False)
    outputs = []
    outputs.append(httpclient.InferRequestedOutput('OUTPUT0', binary_data=False))
    outputs.append(httpclient.InferRequestedOutput('OUTPUT1', binary_data=False))

    results = triton_client.infer('example_python', inputs=inputs, outputs=outputs)
    output_data0 = results.as_numpy('OUTPUT0')
    output_data1 = results.as_numpy('OUTPUT1')

    print(input_data0)
    print(input_data1)
    print(output_data0)
    print(output_data1)

  • http参考 https://github.com/kamalkraj/stable-diffusion-tritonserver/blob/master/Inference.ipynb 主要包括设置输入输出占位符和进行推理
import numpy as np
import tritonclient.http
# model
model_name = "stable_diffusion"
url = "0.0.0.0:8000"
model_version = "1"
batch_size = 1
# model input params
prompt = "A small cabin on top of a snowy mountain in the style of Disney, artstation"
samples = 1 # no.of images to generate
steps = 45
guidance_scale = 7.5
seed = 1024
triton_client = tritonclient.http.InferenceServerClient(url=url, verbose=False)
assert triton_client.is_model_ready(
    model_name=model_name, model_version=model_version
), f"model {model_name} not yet ready"

model_metadata = triton_client.get_model_metadata(model_name=model_name, model_version=model_version)
model_config = triton_client.get_model_config(model_name=model_name, model_version=model_version)
# Input placeholder
prompt_in = tritonclient.http.InferInput(name="PROMPT", shape=(batch_size,), datatype="BYTES")# 如果为图像shape可以写成shape=(batch_size,64,64,3),
samples_in = tritonclient.http.InferInput("SAMPLES", (batch_size, ), "INT32")
steps_in = tritonclient.http.InferInput("STEPS", (batch_size, ), "INT32")
guidance_scale_in = tritonclient.http.InferInput("GUIDANCE_SCALE", (batch_size, ), "FP32")
seed_in = tritonclient.http.InferInput("SEED", (batch_size, ), "INT64")

images = tritonclient.http.InferRequestedOutput(name="IMAGES", binary_data=False)
%%time
# Setting inputs
prompt_in.set_data_from_numpy(np.asarray([prompt] * batch_size, dtype=object))
samples_in.set_data_from_numpy(np.asarray([samples], dtype=np.int32))
steps_in.set_data_from_numpy(np.asarray([steps], dtype=np.int32))
guidance_scale_in.set_data_from_numpy(np.asarray([guidance_scale], dtype=np.float32))
seed_in.set_data_from_numpy(np.asarray([seed], dtype=np.int64))

response = triton_client.infer(
    model_name=model_name, model_version=model_version, 
    inputs=[prompt_in,samples_in,steps_in,guidance_scale_in,seed_in], 
    outputs=[images]
)
CPU times: user 92.3 ms, sys: 39.5 ms, total: 132 ms
Wall time: 6.31 s
images = response.as_numpy("IMAGES")
from PIL import Image
if images.ndim == 3:
    images = images[None, ...]
images = (images * 255).round().astype("uint8")
pil_images = [Image.fromarray(image) for image in images]
def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols

    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    grid_w, grid_h = grid.size
    
    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid
rows = 1 # change according to no.of samples 
cols = 1 # change according to no.of samples
# rows * cols == no.of samples
image_grid(pil_images, rows, cols)
  • tritonclient.http.InferInput的name与server端的config一致

error :tritonclient.utils.InferenceServerException: got unexpected datatype BYTES from numpy array, expected FP32

  • 根据提示修改类型即可tritonclient.http.InferInput(name=“PROMPT”, shape=(batch_size,), datatype=“BYTES”)

CG

# https://bobbyhadz.com/blog/python-unicodeencodeerror-ascii-codec-cant-encode-character-in-position#:~:text=The%20Python%20%22UnicodeEncodeError%3A%20%27ascii%27%20codec%20can%27t%20encode%20character,the%20error%2C%20specify%20the%20correct%20encoding%2C%20e.g.%20utf-8.
my_str = 'one ф'

# 👇️ encode str to bytes
my_bytes = my_str.encode('utf-8')
print(my_bytes)  # 👉️ b'one \xd1\x84'

# 👇️ decode bytes to str
my_str_again = my_bytes.decode('utf-8')
print(my_str_again)  # 👉️ "one ф"

  • 注:关于config文件类型,如果不知道可以先乱写一个,然后根据错误提示进行修改即可
  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值