配置的简单实例
- https://github.com/triton-inference-server/server/tree/main/docs/examples/model_repository
配置文件
- TensorFlow saved-model ,ONNX模型的参数配置文件 在命令中使用
--srtict-model-config=false
时可省略 - 例如:
tritonserver --model-repository=./model_repo/ --srtict-model-config=false ……
必要参数
platform & backend
TensorRT | ONNXRT | TensorFlow | PyTorch | OpenVINO | Python | DALI | Custom | |
---|---|---|---|---|---|---|---|---|
platform | tensorrt_plan | onnxruntime_onnx | tensorflow_graphedf /tensorflow_savedmodel (must) | pytorch_libtorch | - | - | - | |
or | or | or | ||||||
backend | tensorrt | onnxruntime | tensorflow(optional) | pytorch | openvino | python | dali | <backend_name> |
input and output and max_batch_size
input_names = ['input']
output_names = ['output']
torch.onnx.export(model, image, onnx_file, verbose=False,
input_names=input_names, output_names=output_names,
opset_version=11,
dynamic_axes={"input":{0: "batch_size"}, "output":{0: "batch_size"},})
- case 0:
max_batch_size = 8
,input shape = [N,3,224,224], N<=8
,输入的数据实际上自动补齐batch维度,dims不需要写上batch的大小 - case 1:
max_batch_size = 0
,input shape = [3,224,224]
,max_batch_size = 0则没有设置batch,dims需要写上batch的大小 - case 2:
name=INPUT__0
(双下划线,name__index),input shape = [3,-1,-1]
可以输入[3,100,100]或[3,200,200]大小的图形 - case 3:
reshape{ shape:[1,3,224,224]}
max_batch_size (请求例)
非必要参数
version policy
version_policy
verson_policy:{ all{} }
all: 模型存储库中可用的模型的所有版本都可用于推断verson_policy:{ latest{ num_version:1} }
latest: 最新版本. 模型的最新版本是数字上最大的版本号.verson_policy:{ specific{ num_version:1,2} }
specific: 只有特别列出的模型版本才可用于推断.
instance_group - GPU 资源分配,对一个模型多个并行,提高GPU利用率
- count: 当前实例的数量(模型数)
instance_group[{ count:1 gpus:0 kind:KIND_GPU }]
instance_group[{ count:2 kind:KIND_CPU }]
,instance_group[{ count:1 gpus:[0] kind:KIND_GPU }]
,kind: KIND_GPU 或 KIND_CPU,- 设置多组,运行在不同的设备上
instance_group[ { count:2 kind:KIND_CPU }, { count:1 gpus:[0] kind:KIND_GPU }, { count:2 gpus:[1,2] kind:KIND_GPU }]
gpus设置GPU卡号,默认全部使用
scheduling 调度策略
Default 如之前的操作
dynamic_batching 能自动合并请求,提高吞吐量
dynamic_batching{preferred_batch_size:[2,4,8,16]}
dynamic_batching{preferred_batch_size:[2,4,8,16] max_queue_delay_microseconds:100}
打包batch的时间限制
Sequence Batcher
- 可以保证同一个序列输入都在一个模型实例上进行推理
Ensemble Scheduler
- https://github.com/triton-inference-server/server/blob/main/docs/user_guide/architecture.md#ensemble-models
- 缝合不同的模块,形成一个pipeline
Ensemble folder: https://forums.developer.nvidia.com/t/triton-ensemble-model-version/182635
├─ path/to/models
├── ensemble_name
| ├── config.pbtxt
| ├── 1 (empty)
├── MODEL1
| ├── config.pbtxt
| ├── 1
├── MODEL2
| ├── config.pbtxt
| ├── 1
name:"ensemble_model"
platform:"ensemble" // 平台指定为ensemble
max_batch_size:1
input[
{
name:"IMAGE"
data_type:TYPE_STRING
dims:[1]
}
]
output[
{
name:"CLASSIFICATION"
data_type:TYPE_FP32
dims:[1000]
},
{
name:"SEGMENTATION"
data_type:TYPE_FP32
dims:[3,224,224]
}
]
ensemble_scheduling{
step[
{
model_name :"image_preprocess_model"
model_version:-1
input_map { //*_map定义从模型到ensemble_model中的名称映射
key:"RAW_IMAGE" //key is real input/output name of "image_preprocess_model"
value:"IMAGE" //第一个步骤的输入名称和上边name:"ensemble_model"的一致
}
output_map {
key:"PREPROCESSED_OUTPUT" //key is real input/output name of "image_preprocess_model"
value:"preprocessed_image" // 第一步的输出在ensemble_model的新名称
}
},
{
model_name :"classification_model"
model_version:-1
input_map {
key:"FORMATTED_IMAGE"
value:"preprocessed_image" //名称用于step之间的链接
}
output_map {
key:"CLASSIFICATION_OUTPUT"
value:"CLASSIFICATION" // 与output名称一致
}
},
{
model_name :"segmentation_model"
model_version:-1
input_map {
key:"FORMATTED_IMAGE"
value:"preprocessed_image"
}
output_map {
key:"SEGMENTATION_OUTPUT"
value:"SEGMENTATION" // 与output名称一致
}
}
]
}
optimization policy - 性能
- 两种易用的优化手段,分别对于ONNX和TensorFlow
MODEL WARMUP - 模型热身
model_warmup[
{
batchsize:64
name:"warmup_requests"
inputs{
random_data:true
dims:[229,229,3]
data_type:TYPE_FP32
}
}
]